Thank you, Yun, for pointing me to the related issue. I'll keep an eye on it.
All the best, Matthias On Wed, Aug 11, 2021 at 10:50 PM Yun Gao <yungao...@aliyun.com> wrote: > Hi Matthias, > > Sorry for the late reply, this should be a known issue that Flink would > lost the last piece of data for bounded dataset with 2pc sink. However, > we are expected to fix this issue in the upcoming 1.14 version [1]. > > Best, > Yun > > > [1] https://issues.apache.org/jira/browse/FLINK-2491 > > ------------------Original Mail ------------------ > *Sender:*Matthias Broecheler <matth...@dataeng.ai> > *Send Date:*Sat Aug 7 04:59:50 2021 > *Recipients:*Flink User Group <user@flink.apache.org> > *Subject:*StreamFileSink not closing file > >> Hey guys, >> >> I wrote a simple DataStream that counts up some numbers into a SideOutput >> which I am trying to sink into a StreamFileSink so that I can write the >> results to disk and read them from there. >> >> I'm running my little test locally and I can see that the data is being >> written to hidden "inproress" files but those aren't closed when the job >> terminates. I have enabled checkpointing, tried running in batch mode, and >> played around with various rolling policy settings (rolloverinterval = >> 1) but none of it seems to trigger flink close off the file at the end of >> the job. >> >> Is there a way to trigger a checkpoint in Flink at the end of a job which >> would trigger the file to be closed? I tried setting the checkpointing >> interval to 10 ms but that didn't work either. >> >> I realize that this is a total newbie question but I couldn't find any >> answers on StackOverflow or the archive. >> Thanks for your help, >> Matthias >> >