Thank you, Yun, for pointing me to the related issue. I'll keep an eye on
it.

All the best,
Matthias

On Wed, Aug 11, 2021 at 10:50 PM Yun Gao <yungao...@aliyun.com> wrote:

> Hi Matthias,
>
> Sorry for the late reply, this should be a known issue that Flink would
> lost the last piece of data for bounded dataset with 2pc sink. However,
> we are expected to fix this issue in the upcoming 1.14 version [1].
>
> Best,
> Yun
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-2491
>
> ------------------Original Mail ------------------
> *Sender:*Matthias Broecheler <matth...@dataeng.ai>
> *Send Date:*Sat Aug 7 04:59:50 2021
> *Recipients:*Flink User Group <user@flink.apache.org>
> *Subject:*StreamFileSink not closing file
>
>> Hey guys,
>>
>> I wrote a simple DataStream that counts up some numbers into a SideOutput
>> which I am trying to sink into a StreamFileSink so that I can write the
>> results to disk and read them from there.
>>
>> I'm running my little test locally and I can see that the data is being
>> written to hidden "inproress" files but those aren't closed when the job
>> terminates. I have enabled checkpointing, tried running in batch mode, and
>> played around with various rolling policy settings (rolloverinterval =
>> 1) but none of it seems to trigger flink close off the file at the end of
>> the job.
>>
>> Is there a way to trigger a checkpoint in Flink at the end of a job which
>> would trigger the file to be closed? I tried setting the checkpointing
>> interval to 10 ms but that didn't work either.
>>
>> I realize that this is a total newbie question but I couldn't find any
>> answers on StackOverflow or the archive.
>> Thanks for your help,
>> Matthias
>>
>

Reply via email to