2021-03-04 02:33:25,292 DEBUG org.apache.flink.runtime.rpc.akka.SupervisorActor 
           [] - Starting FencedAkkaRpcActor with name jobmanager_2.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,304 INFO  
org.apache.flink.runtime.rpc.akka.AkkaRpcService             [] - Starting RPC 
endpoint for org.apache.flink.runtime.jobmaster.JobMaster at 
akka://flink/user/rpc/jobmanager_2 .
2021/3/4 上午10:33:25 2021-03-04 02:33:25,310 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Initializing 
job TransactionAndAccount (00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,323 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using restart 
back off time strategy 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=2147483647, 
backoffTimeMS=1000) for TransactionAndAccount 
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,380 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Running 
initialization on master for job TransactionAndAccount 
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,380 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Successfully 
ran initialization on master in 0 ms.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,381 DEBUG 
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Adding 2 
vertices from job graph TransactionAndAccount 
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,381 DEBUG 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Attaching 2 
topologically sorted vertices to existing job graph with 0 vertices and 0 
intermediate results.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Connecting 
ExecutionJobVertex cbc357ccb763df2852fee8c4fc7d55f2 (Source: Custom Source -> 
format to json -> Filter -> process timestamp range -> Timestamps/Watermarks) 
to 0 predecessors.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Connecting 
ExecutionJobVertex 337adade1e207453ed3502e01d75fd03 
(Window(TumblingEventTimeWindows(86400000), EventTimeTrigger, SumAggregator, 
PassThroughWindowFunction) -> Flat Map -> Sink: tidb) to 1 predecessors.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,389 DEBUG 
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Connecting 
input 0 of vertex 337adade1e207453ed3502e01d75fd03 
(Window(TumblingEventTimeWindows(86400000), EventTimeTrigger, SumAggregator, 
PassThroughWindowFunction) -> Flat Map -> Sink: tidb) to intermediate result 
referenced via predecessor cbc357ccb763df2852fee8c4fc7d55f2 (Source: Custom 
Source -> format to json -> Filter -> process timestamp range -> 
Timestamps/Watermarks).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,395 INFO  
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built 
1 pipelined regions in 2 ms
2021/3/4 上午10:33:25 2021-03-04 02:33:25,396 DEBUG 
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Successfully 
created execution graph from job graph TransactionAndAccount 
(00000000000000000000000000000000).
2021/3/4 上午10:33:25 2021-03-04 02:33:25,406 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
job/cluster config to configure application-defined state backend: File State 
Backend (checkpoints: 'oss://xx/backend', savepoints: 'null', asynchronous: 
TRUE, fileStateThreshold: 20480)
2021/3/4 上午10:33:25 2021-03-04 02:33:25,406 INFO  
org.apache.flink.runtime.jobmaster.JobMaster                 [] - Using 
application-defined state backend: File State Backend (checkpoints: 
'oss://xx/backend', savepoints: 'null', asynchronous: TRUE, fileStateThreshold: 
20480)
2021/3/4 上午10:33:25 2021-03-04 02:33:25,419 INFO  
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - 
[Server]Unable to execute HTTP request: Not Found
2021/3/4 上午10:33:25 [ErrorCode]: NoSuchKey
2021/3/4 上午10:33:25 [RequestId]: 604046F58B49C830320A1A53
2021/3/4 上午10:33:25 [HostId]: null
2021/3/4 上午10:33:25 2021-03-04 02:33:25,432 INFO  
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - 
[Server]Unable to execute HTTP request: Not Found
2021/3/4 上午10:33:25 [ErrorCode]: NoSuchKey
2021/3/4 上午10:33:25 [RequestId]: 604046F58B49C830322A1A53
2021/3/4 上午10:33:25 [HostId]: null
2021/3/4 上午10:33:25 2021-03-04 02:33:25,442 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - 
Recovering checkpoints from 
KubernetesStateHandleStore{configMapName='demo-00000000000000000000000000000000-jobmanager-leader'}.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,448 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Found 
1 checkpoints in 
KubernetesStateHandleStore{configMapName='demo-00000000000000000000000000000000-jobmanager-leader'}.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,449 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying 
to fetch 1 checkpoints from storage.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,449 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying 
to retrieve checkpoint 10167.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,483 DEBUG 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Status of the 
shared state registry of job 00000000000000000000000000000000 after restore: 
SharedStateRegistry{registeredStates={}}.
2021/3/4 上午10:33:25 2021-03-04 02:33:25,483 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Restoring job 
00000000000000000000000000000000 from Checkpoint 10167 @ 1614825175716 for 
00000000000000000000000000000000 located at 
oss://xx/backend/00000000000000000000000000000000/chk-10167.

检查了jobmanager日志,同样存在此报错NoSuchKey


2021年3月3日 上午11:23,王 羽凡 
<[email protected]<mailto:[email protected]>> 写道:

版本:Flink 1.12.0
环境:Native Kubernetes
模式:Application Mode

描述:
Flink以Native Kubernetes Application模式运行在k8s时,使用filesystem 
OSS作为backend发现日志请求OSS报错。
当代码使用`source.setStartFromEarliest();`,启动job之后从头开始消费,运行过程正常,运行到最新点位后会出现以下报错,过一段时间或者重启job之后报错消失。
当代码使用`source.setStartFromLatest();`,启动job之后直接从最新点位开始消费,则不会出现此报错。
据观察请问是我哪里配置或者使用有问题么?

命令:

./bin/flink run-application \
    --target kubernetes-application \
    -Dkubernetes.cluster-id=demo \
    -Dkubernetes.container.image=xx/xx/xx:2.0.16 \
    -Dstate.backend=filesystem \
    
-Dstate.checkpoints.dir=oss://bucket/文件夹<oss://bucket/%E6%96%87%E4%BB%B6%E5%A4%B9>
 \
    
-Dfs.oss.endpoint=oss-cn-beijing-internal.aliyuncs.com<http://oss-cn-beijing-internal.aliyuncs.com/>
 \
    -Dfs.oss.accessKeyId=xx \
    -Dfs.oss.accessKeySecret=xx \
    local:///opt/flink/usrlib/my-flink-job.jar

报错日志:

2021-03-03 02:53:46,133 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Committing offset 12701:1:-1:4 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午10:53:46 2021-03-03 02:53:46,140 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Successfully committed offset 12701:1:-1:4 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午10:53:50 2021-03-03 02:53:50,899 INFO  
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - 
[Server]Unable to execute HTTP request: Not Found
2021/3/3 上午10:53:50 [ErrorCode]: NoSuchKey
2021/3/3 上午10:53:50 [RequestId]: xx
2021/3/3 上午10:53:50 [HostId]: null
2021/3/3 上午10:53:50 2021-03-03 02:53:50,904 INFO  
org.apache.flink.fs.osshadoop.shaded.com.aliyun.oss          [] - 
[Server]Unable to execute HTTP request: Not Found
2021/3/3 上午10:53:50 [ErrorCode]: NoSuchKey
2021/3/3 上午10:53:50 [RequestId]: xx
2021/3/3 上午10:53:50 [HostId]: null

kill进程pod重启或过一段时间后taskManager正常日志:

2021-03-03 03:18:21,602 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Successfully committed offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:26 2021-03-03 03:18:26,573 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Committing offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:26 2021-03-03 03:18:26,582 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Successfully committed offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:31 2021-03-03 03:18:31,571 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Committing offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:31 2021-03-03 03:18:31,580 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Successfully committed offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:36 2021-03-03 03:18:36,633 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Committing offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}
2021/3/3 上午11:18:36 2021-03-03 03:18:36,642 INFO  
org.apache.flink.streaming.connectors.pulsar.internal.PulsarMetadataReader [] - 
Successfully committed offset 12716:7:-1:1 to topic 
TopicRange{topic=persistent://public/xx/xxxx, 
key-range=SerializableRange{range=[0, 65535]}}

oss内文件:
<粘贴的图形-1.png>
chk-10880目录:
<粘贴的图形-2.png>

回复