Hi Matthias, Sorry for the late reply, this should be a known issue that Flink would lost the last piece of data for bounded dataset with 2pc sink. However, we are expected to fix this issue in the upcoming 1.14 version [1].
Best, Yun [1] https://issues.apache.org/jira/browse/FLINK-2491 ------------------Original Mail ------------------ Sender:Matthias Broecheler <matth...@dataeng.ai> Send Date:Sat Aug 7 04:59:50 2021 Recipients:Flink User Group <user@flink.apache.org> Subject:StreamFileSink not closing file Hey guys, I wrote a simple DataStream that counts up some numbers into a SideOutput which I am trying to sink into a StreamFileSink so that I can write the results to disk and read them from there. I'm running my little test locally and I can see that the data is being written to hidden "inproress" files but those aren't closed when the job terminates. I have enabled checkpointing, tried running in batch mode, and played around with various rolling policy settings (rolloverinterval = 1) but none of it seems to trigger flink close off the file at the end of the job. Is there a way to trigger a checkpoint in Flink at the end of a job which would trigger the file to be closed? I tried setting the checkpointing interval to 10 ms but that didn't work either. I realize that this is a total newbie question but I couldn't find any answers on StackOverflow or the archive. Thanks for your help, Matthias