Hi Gordon,
Later I found the implementation of the Elastic Search Sink provided by
Flink, and I found it also use the mechanism to flush the data when checkpoints
happens. I apply the method, now the problem is solved. It uses exactly the
method you have provided. Thanks a lot for your help.
Best
Henry
> 在 2018年11月14日,下午5:08,Tzu-Li (Gordon) Tai <[email protected]> 写道:
>
> Hi Henry,
>
> Flushing of buffered data in sinks should occur on two occasions - 1) when
> some buffer size limit is reached or a fixed-flush interval is fired, and 2)
> on checkpoints.
>
> Flushing any pending data before completing a checkpoint ensures the sink has
> at-least-once guarantees, so that should answer your question about data loss.
> For data delay due to the buffering, my only suggestion would be to have a
> time-interval based flushing configuration.
> That is what is currently happening, for example, in the Kafka / Kinesis
> producer sinks. Records are buffered, and flushed at fixed intervals or when
> the buffer is full. They are also flushed on every checkpoint.
>
> Cheers,
> Gordon
>
> On 13 November 2018 at 5:07:32 PM, 徐涛 ([email protected]
> <mailto:[email protected]>) wrote:
>
>> Hi Experts,
>> When we implement a sink, usually we implement a batch, according to the
>> record number or when reaching a time interval, however this may lead to
>> data of last batch do not write to sink. Because it is triggered by the
>> incoming record.
>> I also test the JDBCOutputFormat provided by flink, and found that it also
>> has the same problem. If the batch size is 50, and 49 items arrive, but the
>> last one comes in an hour later, then the 49 items will not be written to
>> sink during the one hour. This may cause data delay or data loss.
>> So should any pose a solution to this problem?
>> Thanks a lot.
>>
>> Best
>> Henry