Hi, could you provide some logs for this problematic job because I would like to double check the reason why this violated precondition did actually happen?
Thanks, Stefan > Am 20.09.2018 um 17:24 schrieb Stefan Richter <s.rich...@data-artisans.com>: > > FYI, here a link to my PR: https://github.com/apache/flink/pull/6723 > >> Am 20.09.2018 um 14:52 schrieb Stefan Richter <s.rich...@data-artisans.com>: >> >> Hi, >> >> I think the failing precondition is too strict because sometimes a >> checkpoint can overtake another checkpoint and in that case the commit is >> already subsumed. I will open a Jira and PR with a fix. >> >> Best, >> Stefan >> >>> Am 19.09.2018 um 10:04 schrieb PedroMrChaves <pedro.mr.cha...@gmail.com>: >>> >>> Hello, >>> >>> I have a running Flink job that reads data form one Kafka topic, applies >>> some transformations and writes data back into another Kafka topic. The job >>> sometimes restarts due to the following error: >>> >>> /java.lang.RuntimeException: Error while confirming checkpoint >>> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1260) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>> at java.lang.Thread.run(Thread.java:748) >>> Caused by: java.lang.IllegalStateException: checkpoint completed, but no >>> transaction pending >>> at >>> org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) >>> at >>> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:258) >>> at >>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:130) >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:650) >>> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1255) >>> ... 5 more >>> 2018-09-18 22:00:10,716 INFO >>> org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not >>> restart the job Alert_Correlation (3c60b8670c81a629716bb2e42334edea) because >>> the restart strategy prevented it. >>> java.lang.RuntimeException: Error while confirming checkpoint >>> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1260) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>> at java.lang.Thread.run(Thread.java:748) >>> Caused by: java.lang.IllegalStateException: checkpoint completed, but no >>> transaction pending >>> at >>> org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) >>> at >>> org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:258) >>> at >>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:130) >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:650) >>> at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1255) >>> ... 5 more/ >>> >>> My state is very small for this particular job, just a few KBs. >>> >>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t612/Screen_Shot_2018-09-19_at_09.png> >>> >>> >>> >>> Flink Version: 1.4.2 >>> State Backend: hadoop 2.8 >>> >>> Regards, >>> Pedro Chaves >>> >>> >>> >>> ----- >>> Best Regards, >>> Pedro Chaves >>> -- >>> Sent from: >>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >> >