Hi Matthias,

Sorry for the late reply, this should be a known issue that Flink would
lost the last piece of data for bounded dataset with 2pc sink. However,
we are expected to fix this issue in the upcoming 1.14 version [1].

Best,
Yun


[1] https://issues.apache.org/jira/browse/FLINK-2491


 ------------------Original Mail ------------------
Sender:Matthias Broecheler <matth...@dataeng.ai>
Send Date:Sat Aug 7 04:59:50 2021
Recipients:Flink User Group <user@flink.apache.org>
Subject:StreamFileSink not closing file

Hey guys,

I wrote a simple DataStream that counts up some numbers into a SideOutput which 
I am trying to sink into a StreamFileSink so that I can write the results to 
disk and read them from there.

I'm running my little test locally and I can see that the data is being written 
to hidden "inproress" files but those aren't closed when the job terminates. I 
have enabled checkpointing, tried running in batch mode, and played around with 
various rolling policy settings (rolloverinterval = 1) but none of it seems to 
trigger flink close off the file at the end of the job.

Is there a way to trigger a checkpoint in Flink at the end of a job which would 
trigger the file to be closed? I tried setting the checkpointing interval to 10 
ms but that didn't work either.

I realize that this is a total newbie question but I couldn't find any answers 
on StackOverflow or the archive.
Thanks for your help,
Matthias 

Reply via email to