Re: Last batch of stream data could not be sinked when data comes very slow

徐涛 Sat, 17 Nov 2018 16:27:37 -0800

Hi Gordon,
        Later I found the implementation of the Elastic Search Sink provided by 
Flink, and I found it also use the mechanism to flush the data when checkpoints 
happens. I apply the method, now the problem is solved. It uses exactly the 
method you have provided. Thanks a lot for your help.


Best
Henry

> 在 2018年11月14日，下午5:08，Tzu-Li (Gordon) Tai <tzuli...@apache.org> 写道：
> 
> Hi Henry,
> 
> Flushing of buffered data in sinks should occur on two occasions - 1) when 
> some buffer size limit is reached or a fixed-flush interval is fired, and 2) 
> on checkpoints.
> 
> Flushing any pending data before completing a checkpoint ensures the sink has 
> at-least-once guarantees, so that should answer your question about data loss.
> For data delay due to the buffering, my only suggestion would be to have a 
> time-interval based flushing configuration.
> That is what is currently happening, for example, in the Kafka / Kinesis 
> producer sinks. Records are buffered, and flushed at fixed intervals or when 
> the buffer is full. They are also flushed on every checkpoint.
> 
> Cheers,
> Gordon
> 
> On 13 November 2018 at 5:07:32 PM, 徐涛 (happydexu...@gmail.com 
> <mailto:happydexu...@gmail.com>) wrote:
> 
>> Hi Experts,
>> When we implement a sink, usually we implement a batch, according to the 
>> record number or when reaching a time interval, however this may lead to 
>> data of last batch do not write to sink. Because it is triggered by the 
>> incoming record.
>> I also test the JDBCOutputFormat provided by flink, and found that it also 
>> has the same problem. If the batch size is 50, and 49 items arrive, but the 
>> last one comes in an hour later, then the 49 items will not be written to 
>> sink during the one hour. This may cause data delay or data loss.
>> So should any pose a solution to this problem?
>> Thanks a lot.
>> 
>> Best
>> Henry

Re: Last batch of stream data could not be sinked when data comes very slow

Reply via email to