Hi,

I am using Data stream file source connector in one of my use case.
I was going through the documentation where I found below limitations:

https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/#current-limitations

  1.  Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
Queries:
Is there any FLIP/design document to better understand the impact of these 
limitations?
Also, is there any work ongoing on these limitations for future Flink releases, 
if yes, please redirect to any related document?




  1.  For Unbounded File Sources, the enumerator currently remembers paths of 
all already processed files, which is a state that can, in some cases, grow 
rather large.
Query:
       What all data per file is part of checkpointing state by file source?

Appreciate any help!

Regards,
Kirti Dhar

Reply via email to