2021-03-04 02:33:25,292 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor
[] - Starting FencedAkkaRpcActor with name jobmanager_2.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,304 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC
endpoint for org.apache.flink.runtime.jobmaster.JobMaster at
akka://flink/user/rpc/jobmanager_2 .
2021/3/4 上午10:33:25 2021-03-04 02:33:25,310 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Initializing
job TransactionAndAccount (00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,323 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Using restart
back off time strategy
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647,
backoffTimeMS=1000) for TransactionAndAccount
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,380 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Running
initialization on master for job TransactionAndAccount
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,380 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully
ran initialization on master in 0 ms.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,381 DEBUG
org.apache.flink.runtime.jobmaster.JobMaster [] - Adding 2
vertices from job graph TransactionAndAccount
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,381 DEBUG
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Attaching 2
topologically sorted vertices to existing job graph with 0 vertices and 0
intermediate results.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Connecting
ExecutionJobVertex cbc357ccb763df2852fee8c4fc7d55f2 (Source: Custom Source ->
format to json -> Filter -> process timestamp range -> Timestamps/Watermarks)
to 0 predecessors.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Connecting
ExecutionJobVertex 337adade1e207453ed3502e01d75fd03
(Window(TumblingEventTimeWindows(86400000), EventTimeTrigger, SumAggregator,
PassThroughWindowFunction) -> Flat Map -> Sink: tidb) to 1 predecessors.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Connecting
input 0 of vertex 337adade1e207453ed3502e01d75fd03
(Window(TumblingEventTimeWindows(86400000), EventTimeTrigger, SumAggregator,
PassThroughWindowFunction) -> Flat Map -> Sink: tidb) to intermediate result
referenced via predecessor cbc357ccb763df2852fee8c4fc7d55f2 (Source: Custom
Source -> format to json -> Filter -> process timestamp range ->
Timestamps/Watermarks).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,395 INFO
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built
1 pipelined regions in 2 ms
2021/3/4 上午10:33:25 2021-03-04 02:33:25,396 DEBUG
org.apache.flink.runtime.jobmaster.JobMaster [] - Successfully
created execution graph from job graph TransactionAndAccount
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,406 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Using
job/cluster config to configure application-defined state backend: File State
Backend (checkpoints: 'oss://xx/backend', savepoints: 'null', asynchronous:
TRUE, fileStateThreshold: 20480)
2021/3/4 上午10:33:25 2021-03-04 02:33:25,406 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Using
application-defined state backend: File State Backend (checkpoints:
'oss://xx/backend', savepoints: 'null', asynchronous: TRUE, fileStateThreshold:
20480)
2021/3/4 上午10:33:25 2021-03-04 02:33:25,419 INFO
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] -
[Server]Unable to execute HTTP request: Not Found
2021/3/4 上午10:33:25 [ErrorCode]: NoSuchKey
2021/3/4 上午10:33:25 [RequestId]: 604046F58B49C830320A1A53
2021/3/4 上午10:33:25 [HostId]: null
2021/3/4 上午10:33:25 2021-03-04 02:33:25,432 INFO
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] -
[Server]Unable to execute HTTP request: Not Found
2021/3/4 上午10:33:25 [ErrorCode]: NoSuchKey
2021/3/4 上午10:33:25 [RequestId]: 604046F58B49C830322A1A53
2021/3/4 上午10:33:25 [HostId]: null
2021/3/4 上午10:33:25 2021-03-04 02:33:25,442 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
Recovering checkpoints from
KubernetesStateHandleStore{configMapName='demo-00000000000000000000000000000000-jobmanager-leader'}.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,448 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Found
1 checkpoints in
KubernetesStateHandleStore{configMapName='demo-00000000000000000000000000000000-jobmanager-leader'}.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,449 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying
to fetch 1 checkpoints from storage.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,449 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying
to retrieve checkpoint 10167.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,483 DEBUG
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Status of the
shared state registry of job 00000000000000000000000000000000 after restore:
SharedStateRegistry{registeredStates={}}.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,483 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job
00000000000000000000000000000000 from Checkpoint 10167 @ 1614825175716 for
00000000000000000000000000000000 located at
oss://xx/backend/00000000000000000000000000000000/chk-10167.
检查了jobmanager日志,同样存在此报错NoSuchKey
2021年3月3日 上午11:23,王 羽凡
<[email protected]<mailto:[email protected]>> 写道:
版本:Flink 1.12.0
环境:Native Kubernetes
模式:Application Mode
描述:
Flink以Native Kubernetes Application模式运行在k8s时,使用filesystem
OSS作为backend发现日志请求OSS报错。
当代码使用`source.setStartFromEarliest();`,启动job之后从头开始消费,运行过程正常,运行到最新点位后会出现以下报错,过一段时间或者重启job之后报错消失。
当代码使用`source.setStartFromLatest();`,启动job之后直接从最新点位开始消费,则不会出现此报错。
据观察请问是我哪里配置或者使用有问题么?
命令:
./bin/flink run-application \
--target kubernetes-application \
-Dkubernetes.cluster-id=demo \
-Dkubernetes.container.image=xx/xx/xx:2.0.16 \
-Dstate.backend=filesystem \
-Dstate.checkpoints.dir=oss://bucket/文件夹<oss://bucket/%E6%96%87%E4%BB%B6%E5%A4%B9>
\
-Dfs.oss.endpoint=oss-cn-beijing-internal.aliyuncs.com<http://oss-cn-beijing-internal.aliyuncs.com/>
\
-Dfs.oss.accessKeyId=xx \
-Dfs.oss.accessKeySecret=xx \
local:///opt/flink/usrlib/my-flink-job.jar
报错日志:
2021-03-03 02:53:46,133 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Committing offset 12701:1:-1:4 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午10:53:46 2021-03-03 02:53:46,140 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Successfully committed offset 12701:1:-1:4 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午10:53:50 2021-03-03 02:53:50,899 INFO
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] -
[Server]Unable to execute HTTP request: Not Found
2021/3/3 上午10:53:50 [ErrorCode]: NoSuchKey
2021/3/3 上午10:53:50 [RequestId]: xx
2021/3/3 上午10:53:50 [HostId]: null
2021/3/3 上午10:53:50 2021-03-03 02:53:50,904 INFO
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss [] -
[Server]Unable to execute HTTP request: Not Found
2021/3/3 上午10:53:50 [ErrorCode]: NoSuchKey
2021/3/3 上午10:53:50 [RequestId]: xx
2021/3/3 上午10:53:50 [HostId]: null
kill进程pod重启或过一段时间后taskManager正常日志:
2021-03-03 03:18:21,602 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Successfully committed offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:26 2021-03-03 03:18:26,573 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Committing offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:26 2021-03-03 03:18:26,582 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Successfully committed offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:31 2021-03-03 03:18:31,571 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Committing offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:31 2021-03-03 03:18:31,580 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Successfully committed offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:36 2021-03-03 03:18:36,633 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Committing offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:36 2021-03-03 03:18:36,642 INFO
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] -
Successfully committed offset 12716:7:-1:1 to topic
TopicRange{topic=persistent://public/xx/xxxx,
key-range=SerializableRange{range=[0, 65535]}}
oss内文件:
<粘贴的图形-1.png>
chk-10880目录:
<粘贴的图形-2.png>