Hi:
加了一些日志后发现是 checkpointMetaData 为 NULL 了 
https://github.com/apache/flink/blob/release-1.10.0/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1421
测试程序为读 kafka,然后进行 wordcount,结果写入 kafka。checkpoint 配置如下:
| Checkpointing Mode | Exactly Once |
| Interval | 5s |
| Timeout | 10m 0s |
| Minimum Pause Between Checkpoints | 0ms |
| Maximum Concurrent Checkpoints | 1 |


稳定在第 5377 个 checkpoint 抛出 NPE


虽然原因还不清楚,但是修改了部分代码(见 
https://github.com/yuchuanchen/flink/commit/e5122d9787be1fee9bce141887e0d70c9b0a4f19)后不再出现
 NPE 了。


在 2020-04-21 10:21:56,"chenkaibit" <chenkai...@163.com> 写道:
>
>
>
>这个不是稳定复现的,但是在最近 1.10 上测试的几个作业出现了,触发时也没有其他报错。我加了一些日志,再观察下
>
>
>
>
>在 2020-04-21 01:12:48,"Yun Tang" <myas...@live.com> 写道:
>>Hi
>>
>>这个NPE有点奇怪,从executeCheckpointing方法[1]里面其实比较难定位究竟是哪一个变量或者变量的取值是null。
>>一种排查思路是打开 org.apache.flink.streaming.runtime.tasks 的DEBUG 
>>level日志,通过debug日志缩小范围,判断哪个变量是null
>>
>>这个异常出现的时候,相关task上面的日志有什么异常么,触发这个NPE的条件是什么,稳定复现么?
>>
>>[1] 
>>https://github.com/apache/flink/blob/aa4eb8f0c9ce74e6b92c3d9be5dc8e8cb536239d/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java#L1349
>>
>>祝好
>>唐云
>>
>>________________________________
>>From: chenkaibit <chenkai...@163.com>
>>Sent: Monday, April 20, 2020 18:39
>>To: user-zh@flink.apache.org <user-zh@flink.apache.org>
>>Subject: flink-1.10 checkpoint 偶尔报 NullPointerException
>>
>>大家遇到过这个错误吗, CheckpointOperation.executeCheckpointing 的时候报 NullPointerException
>>java.lang.Exception: Couldnot perform checkpoint 5505for operator Source: 
>>KafkaTableSource(xxx) -> SourceConversion(table=[xxx, source: 
>>[KafkaTableSource(xxx)]], fields=[xxx]) -> Calc(select=[xxx) AS xxx]) -> 
>>SinkConversionToTuple2 -> Sink: Elasticsearch6UpsertTableSink(xxx) (1/1).
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:802)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$triggerCheckpointAsync$3(StreamTask.java:777)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$228/1024478318.call(UnknownSource)
>>
>>    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.run(StreamTaskActionExecutor.java:87)
>>
>>    at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:261)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:186)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:487)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:470)
>>
>>    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
>>
>>    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
>>
>>    at java.lang.Thread.run(Thread.java:745)
>>
>>Causedby: java.lang.NullPointerException
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1411)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:991)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$performCheckpoint$5(StreamTask.java:887)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$229/1010499540.run(UnknownSource)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:860)
>>
>>    at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:793)
>>
>>    ... 12 more

回复