Hi Community,

I am planning to use FileSource (with S3) in my application. Hence encountered 
with below limitations:
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/filesystem/#current-limitations



  1.  Watermarking does not work very well for large backlogs of files. This is 
because watermarks eagerly advance within a file, and the next file might 
contain data later than the watermark.
Ques: Is there any ideal use case/settings/configurations where this problem 
does not come into picture? OR can be avoided?


  1.  For Unbounded File Sources, the enumerator currently remembers paths of 
all already processed files, which is a state that can, in some cases, grow 
rather large.
Ques: As a workaround of this problem, what if I configure a state backend (say 
RocksDBStateBackend) with some configured TTL, which shall automatically delete 
the older data. Is there any repercussions of this?


Regards,
Kirti Dhar


Reply via email to