Hi Gordon, Later I found the implementation of the Elastic Search Sink provided by Flink, and I found it also use the mechanism to flush the data when checkpoints happens. I apply the method, now the problem is solved. It uses exactly the method you have provided. Thanks a lot for your help.
Best Henry > 在 2018年11月14日,下午5:08,Tzu-Li (Gordon) Tai <tzuli...@apache.org> 写道: > > Hi Henry, > > Flushing of buffered data in sinks should occur on two occasions - 1) when > some buffer size limit is reached or a fixed-flush interval is fired, and 2) > on checkpoints. > > Flushing any pending data before completing a checkpoint ensures the sink has > at-least-once guarantees, so that should answer your question about data loss. > For data delay due to the buffering, my only suggestion would be to have a > time-interval based flushing configuration. > That is what is currently happening, for example, in the Kafka / Kinesis > producer sinks. Records are buffered, and flushed at fixed intervals or when > the buffer is full. They are also flushed on every checkpoint. > > Cheers, > Gordon > > On 13 November 2018 at 5:07:32 PM, 徐涛 (happydexu...@gmail.com > <mailto:happydexu...@gmail.com>) wrote: > >> Hi Experts, >> When we implement a sink, usually we implement a batch, according to the >> record number or when reaching a time interval, however this may lead to >> data of last batch do not write to sink. Because it is triggered by the >> incoming record. >> I also test the JDBCOutputFormat provided by flink, and found that it also >> has the same problem. If the batch size is 50, and 49 items arrive, but the >> last one comes in an hour later, then the 49 items will not be written to >> sink during the one hour. This may cause data delay or data loss. >> So should any pose a solution to this problem? >> Thanks a lot. >> >> Best >> Henry