Re: flink做checkpoint失败 Checkpoint Coordinator is suspending.

2021-02-02 文章 chen310
flink版本是1.11,checkpoint配置是:

pipeline.time-characteristic EventTime
execution.checkpointing.interval 60
execution.checkpointing.min-pause 12
execution.checkpointing.timeout 12
execution.checkpointing.externalized-checkpoint-retention
RETAIN_ON_CANCELLATION
state.backend rocksdb
state.backend.incremental true
state.checkpoints.dir hdfs:///tmp/flink/checkpoint

完整的jm log很大,1g多,上面贴的是关键的错误信息



--
Sent from: http://apache-flink.147419.n8.nabble.com/


Re: flink做checkpoint失败 Checkpoint Coordinator is suspending.

2021-02-02 文章 Congxian Qiu
Hi
 你 flink 是什么版本,以及你作业 checkpoint/state 相关的配置是什么呢?如果可以的话,把完整的 jm log 发一下
Best,
Congxian


chen310 <1...@163.com> 于2021年2月1日周一 下午5:41写道:

> 补充下,jobmanager日志异常:
>
> 2021-02-01 08:54:43,639 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:44,642 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:45,644 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:46,647 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:47,649 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:48,652 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:49,655 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:50,658 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:50,921 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> Triggering
> checkpoint 8697 (type=CHECKPOINT) @ 1612169690917 for job
> 1299f2f27e56ec36a4e0ffd3472ad399.
> 2021-02-01 08:54:50,999 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Decline
> checkpoint 8697 by task 320d2c162f17265435777bb65e1a8934 of job
> 1299f2f27e56ec36a4e0ffd3472ad399 at
> container_e21_1596002540781_1159_01_000134 @
> ip-10-120-83-22.ap-northeast-1.compute.internal (dataPort=42984).
> 2021-02-01 08:54:51,661 ERROR
> org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
> occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
> 2021-02-01 08:54:52,654 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph   [] -
> GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
> 180, 60)], properties=[w$start, w$end, w$rowtime, w$proctime],
> select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
> AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
> Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'-MM-dd
> HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1)
> (6beee54a923323c369b046e199f572c4) switched from RUNNING to FAILED on
> org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@379a8f9c.
> java.io.IOException: Could not perform checkpoint 8697 for operator
> GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
> 180, 60)], properties=[w$start, w$end, w$rowtime, w$proctime],
> select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
> AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
> Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'-MM-dd
> HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1).
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:897)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.io
> .CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.io
> .CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:137)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.io
> .CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.io
> .StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
> org.apache.flink.streaming.runtime.io
> .StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
>
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
>
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
> ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> at
>
> org.apache.flink.streaming.runtime.tasks.ma

Re: flink做checkpoint失败 Checkpoint Coordinator is suspending.

2021-02-01 文章 chen310
补充下,jobmanager日志异常:

2021-02-01 08:54:43,639 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:44,642 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:45,644 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:46,647 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:47,649 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:48,652 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:49,655 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:50,658 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:50,921 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Triggering
checkpoint 8697 (type=CHECKPOINT) @ 1612169690917 for job
1299f2f27e56ec36a4e0ffd3472ad399.
2021-02-01 08:54:50,999 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - Decline
checkpoint 8697 by task 320d2c162f17265435777bb65e1a8934 of job
1299f2f27e56ec36a4e0ffd3472ad399 at
container_e21_1596002540781_1159_01_000134 @
ip-10-120-83-22.ap-northeast-1.compute.internal (dataPort=42984).
2021-02-01 08:54:51,661 ERROR
org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  [] - Exception
occurred in REST handler: Job 65892aaedb8064e5743f04b54b5380df not found
2021-02-01 08:54:52,654 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph   [] -
GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
180, 60)], properties=[w$start, w$end, w$rowtime, w$proctime],
select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'-MM-dd
HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1)
(6beee54a923323c369b046e199f572c4) switched from RUNNING to FAILED on
org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@379a8f9c.
java.io.IOException: Could not perform checkpoint 8697 for operator
GroupWindowAggregate(window=[SlidingGroupWindow('w$, requestDateTime,
180, 60)], properties=[w$start, w$end, w$rowtime, w$proctime],
select=[COUNT(DISTINCT $f1) AS totalCount, start('w$) AS w$start, end('w$)
AS w$end, rowtime('w$) AS w$rowtime, proctime('w$) AS w$proctime]) ->
Calc(select=[(UNIX_TIMESTAMP((w$start DATE_FORMAT _UTF-16LE'-MM-dd
HH:mm:ss')) * 1000) AS requestTime, totalCount]) (1/1).
at
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:897)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.io.CheckpointBarrierHandler.notifyCheckpoint(CheckpointBarrierHandler.java:113)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.io.CheckpointBarrierAligner.processBarrier(CheckpointBarrierAligner.java:137)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.io.CheckpointedInputGate.pollNext(CheckpointedInputGate.java:93)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput.emitNext(StreamTaskNetworkInput.java:158)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:67)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:351)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:191)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:567)
~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.in

flink做checkpoint失败 Checkpoint Coordinator is suspending.

2021-02-01 文章 chen310
flink做checkpoint一直失败,请教下是啥原因

 

 



 



--
Sent from: http://apache-flink.147419.n8.nabble.com/