Backpressure is indeed delayed the checkpoints because of gradually 
accumulated inflighting network buffers before barrier alignment.       As 
Piotr explained, 1.5 can improve to some extent.        After 1.5 we plan to 
further speed the checkpoint by controlling the channel reader to improve 
barrier alignment, that has already been verified to decrease the alignment 
time greatly for backpressure scenarios.
        zhijiang
------------------------------------------------------------------发件人:Piotr 
Nowojski <pi...@data-artisans.com>发送时间:2018年4月6日(星期五) 00:06收件人:Edward 
<egb...@hotmail.com>抄 送:user <user@flink.apache.org>主 题:Re: Checkpoints very 
slow with high backpressure
Thanks for the explanation.

I hope that either 1.5 will solve your issue (please let us know if it 
doesn’t!) or if you can’t wait, that decreasing memory buffers can mitigate the 
problem.

Piotrek

> On 5 Apr 2018, at 08:13, Edward <egb...@hotmail.com> wrote:
> 
> Thanks for the update Piotr.
> 
> The reason it prevents us from using checkpoints is this:
> We are relying on the checkpoints to trigger commit of Kafka offsets for our
> source (kafka consumers).
> When there is no backpressure this works fine. When there is backpressure,
> checkpoints fail because they take too long, and our Kafka offsets are never
> committed to Kafka brokers (as we just learned the hard way).
> 
> Normally there is no backpressure in our jobs, but when there is some
> outage, then the jobs do experience 
> backpressure when catching up. And when you're already trying to recover
> from an incident, that is not the ideal time for kafka offsets commits to
> stop working.
> 
> 
> 
> 
> --
> Sent from: 
>http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to