Re: Checkpoints timing out for no apparent reason

2019-07-29 Thread spoganshev
Switching to 1.8 didn't help. Timeout exception from Kinesis is a consequence, not a reason. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Checkpoints timing out for no apparent reason

2019-07-23 Thread spoganshev
Looks like this is the issue: https://issues.apache.org/jira/browse/FLINK-11164 We'll try switching to 1.8 and see if it helps. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Checkpoints timing out for no apparent reason

2019-07-23 Thread spoganshev
I've looked into this problem a little bit more. And it looks like the problem is caused by some problem with Kinesis sink. There is an exception in the logs at the moment in time when the job gets restored after being stalled for about 15 minutes: Encountered an unexpected expired iterator

Re: Checkpoints timing out for no apparent reason

2019-07-19 Thread Andrey Zagrebin
9 at 5:00 PM spoganshev wrote: > The image should be visible now at > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-timing-out-for-no-apparent-reason-td28793.html#none > > It doesn't look like it is a disk performance or network issue. Fe

Re: Checkpoints timing out for no apparent reason

2019-07-18 Thread spoganshev
The image should be visible now at http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-timing-out-for-no-apparent-reason-td28793.html#none It doesn't look like it is a disk performance or network issue. Feels more like some buffer overflowing or timeout due to slightly

Re: Checkpoints timing out for no apparent reason

2019-07-18 Thread Congxian Qiu
Hi The image did not show. incremental checkpoint includes: 1) flush memtable to sst files; 2) checkpoint of RocksDB; 3) snapshot metadata; 4) upload needed sst files to remote, all the first three steps are in sync part, and the fourth step in async part, could you please check whether the sync

Checkpoints timing out for no apparent reason

2019-07-16 Thread spoganshev
We have an issue with a job when it occasionally times out while creating snapshots for no apparent reason: Details: - Flink 1.7.2 - Checkpoints are saved to S3 with presto - Incremental checkpoints are used What might be the cause of this issue? It feels like some internal s3 client timeout