Hi Kanchi, Could you provide with more information on it? Like at what stage this log prints (job recovering, running, etc), any more detailed job or stacktrace.
Best, Zhanghao Chen ________________________________ From: Kanchi Masalia via user <user@flink.apache.org> Sent: Friday, February 16, 2024 4:07 To: Neha Rawat <n.ra...@netwitness.com> Cc: Kenan Kılıçtepe <kkilict...@gmail.com>; user@flink.apache.org <user@flink.apache.org>; Liang Mou <l...@pinterest.com> Subject: Re: Task Manager getting killed while executing sql queries. Hi! We just encountered a similar issue. This is usually caused by: 1) Akka failed sending the message silently, due to problems like oversized payload or serialization failures. In that case, you should find detailed error information in the logs. 2) The recipient needs more time for responding, due to problems like slow machines or network jitters. In that case, you can try to increase akka.ask.timeout. Were you able to resolve the issue? Could you suggest what worked for you? Thanks, Kanchi Masalia On Tue, Aug 29, 2023 at 6:18 AM Neha Rawat <n.ra...@netwitness.com<mailto:n.ra...@netwitness.com>> wrote: Thanks for your response. I did check and the memory utilization looks fine. Attaching a VisualVM screenshot. Memory usage was well under the limits. There are phases of low CPU consumption by Flink (below 10% with spikes that go upto 100%) and number of threads go down as well. That’s the time when I see Error#1. #2 and #3 as listed in the original email. Thanks, Neha From: Kenan Kılıçtepe <kkilict...@gmail.com<mailto:kkilict...@gmail.com>> Sent: Monday, August 28, 2023 4:25 PM To: Neha Rawat <n.ra...@netwitness.com<mailto:n.ra...@netwitness.com>> Cc: user@flink.apache.org<mailto:user@flink.apache.org> Subject: Re: Task Manager getting killed while executing sql queries. You don't often get email from kkilict...@gmail.com<mailto:kkilict...@gmail.com>. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> CAUTION:External email. Do not click or open attachments unless you know and trust the sender. Can it be a memory leak? Have you observed the memory consumption of task managers? Once, task manager crush issue happened for me and it was OOM. On Mon, Aug 28, 2023 at 9:12 PM Neha Rawat <n.ra...@netwitness.com<mailto:n.ra...@netwitness.com>> wrote: Hi, Need some help with the below situation. If would be great if someone could give some pointers on how to resolve this. I am trying to execute 80 SQL queries on data coming in from source – Kafka Topic Event. The results are written to 2 sinks – Kafka topics – Alert & UnsortedAlert. The input data is coming in at an EPS of about 230K Events/sec. * Kafka was up and running all the time on the same host as flink. * Job configurations=> --parallelism 20 --checkpoint-interval 60 --aligned-checkpoint-timeout 5 --min-pause-checkpoints 5 --table-state-ttl 360 * Flink config - state.backend.rocksdb.localdir: /var/netwitness/flink/rocksdb state.backend.rocksdb.memory.write-buffer-ratio: 0.75 state.backend.rocksdb.log.level: WARN_LEVEL state.backend.rocksdb.log.dir: /dev/null state.backend.rocksdb.log.max-file-size: 1MB state.backend.rocksdb.log.file-num: 1 state.backend.rocksdb.thread.num: 8 execution.checkpointing.timeout: 20 min Have also tried the same test with flink config that has changelog enabled, have changed min-pause-checkpoints to 50 but the observation remains the same. Observations- * Checkpointing initially takes less than a second, but as the test progresses there are phases (3-4 consecutive checkpoints) where it takes more than a minutes and sometimes up to 9 minutes. * Task manager gets killed after a couple of hours. * These are the errors in taskExecutor logs – Error 1- 2023-08-22 22:53:59,441 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-16, groupId=Event] Disconnecting from node 1 due to request timeout. 2023-08-22 22:53:59,441 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-16, groupId=Event] Cancelled in-flight FETCH request with correlation id 445553 due to node 1 being disconnected (elapsed time since creation: 147696ms, elapsed time since send: 147696ms, request timeout: 30000ms) 2023-08-22 22:53:59,441 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-16, groupId=Event] Cancelled in-flight METADATA request with correlation id 445555 due to node 1 being disconnected (elapsed time since creation: 1ms, elapsed time since send: 1ms, request timeout: 30000ms) 2023-08-22 22:53:59,441 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=Event-16, groupId=Event] Error sending fetch request (sessionId=1726574973, epoch=367) to node 1: org.apache.kafka.common.errors.DisconnectException: null 2023-08-22 22:54:06,975 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-19, groupId=Event] Disconnecting from node 1 due to request timeout. 2023-08-22 22:54:06,975 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-19, groupId=Event] Cancelled in-flight FETCH request with correlation id 446105 due to node 1 being disconnected (elapsed time since creation: 42100ms, elapsed time since send: 42100ms, request timeout: 30000ms) 2023-08-22 22:54:06,975 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=Event-19, groupId=Event] Error sending fetch request (sessionId=1951018275, epoch=1686) to node 1: org.apache.kafka.common.errors.DisconnectException: null 2023-08-22 23:03:17,824 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-1, groupId=Event] Disconnecting from node 1 due to request timeout. 2023-08-22 23:03:17,825 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-1, groupId=Event] Cancelled in-flight FETCH request with correlation id 438228 due to node 1 being disconnected (elapsed time since creation: 61377ms, elapsed time since send: 61377ms, request timeout: 30000ms) 2023-08-22 23:03:17,826 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=Event-1, groupId=Event] Error sending fetch request (sessionId=869666967, epoch=5) to node 1: org.apache.kafka.common.errors.DisconnectException: null 2023-08-22 23:05:06,655 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-3, groupId=Event] Disconnecting from node 1 due to request timeout. 2023-08-22 23:05:06,655 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-3, groupId=Event] Cancelled in-flight FETCH request with correlation id 439255 due to node 1 being disconnected (elapsed time since creation: 79499ms, elapsed time since send: 79499ms, request timeout: 30000ms) 2023-08-22 23:05:06,656 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-3, groupId=Event] Cancelled in-flight METADATA request with correlation id 439256 due to node 1 being disconnected (elapsed time since creation: 1ms, elapsed time since send: 1ms, request timeout: 30000ms) 2023-08-22 23:05:06,656 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=Event-3, groupId=Event] Error sending fetch request (sessionId=26073738, epoch=4) to node 1: org.apache.kafka.common.errors.DisconnectException: null 2023-08-22 23:07:00,107 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-9, groupId=Event] Disconnecting from node 1 due to request timeout. 2023-08-22 23:07:00,107 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-9, groupId=Event] Cancelled in-flight FETCH request with correlation id 441161 due to node 1 being disconnected (elapsed time since creation: 44847ms, elapsed time since send: 44847ms, request timeout: 30000ms) 2023-08-22 23:07:00,108 INFO org.apache.kafka.clients.FetchSessionHandler [] - [Consumer clientId=Event-9, groupId=Event] Error sending fetch request (sessionId=2113325484, epoch=10) to node 1: org.apache.kafka.common.errors.DisconnectException: null 2023-08-22 23:10:07,342 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-12, groupId=Event] Disconnecting from node 1 due to request timeout. 2023-08-22 23:10:07,343 INFO org.apache.kafka.clients.NetworkClient [] - [Consumer clientId=Event-12, groupId=Event] Cancelled in-flight FETCH request with correlation id 440678 due to node 1 being disconnected (elapsed time since creation: 51417ms, elapsed time since send: 51417ms, request timeout: 30000ms) Error 2- org.apache.kafka.common.errors.DisconnectException: null 2023-08-22 23:10:18,246 INFO org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - GlobalWindowAggregate[25] -> (Calc[26] -> UnsortedAlert[27]: Writer -> UnsortedAlert[27]: Committer, Calc[390] -> UnsortedAlert[391]: Writer -> UnsortedAlert[391]: Committer) (6/20)#0 - asynchronous part of checkpoint 5450 could not be completed. java.util.concurrent.CancellationException: null at java.util.concurrent.FutureTask.report(FutureTask.java:121) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:57) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) [flink-dist-1.17.1.jar:1.17.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] 2023-08-22 23:10:18,245 INFO org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - GlobalWindowAggregate[31] -> (Calc[32] -> UnsortedAlert[33]: Writer -> UnsortedAlert[33]: Committer, Calc[392] -> UnsortedAlert[393]: Writer -> UnsortedAlert[393]: Committer) (12/20)#0 - asynchronous part of checkpoint 5450 could not be completed. java.util.concurrent.CancellationException: null at java.util.concurrent.FutureTask.report(FutureTask.java:121) ~[?:?] at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] at org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:54) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) [flink-dist-1.17.1.jar:1.17.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Error 3- 2023-08-22 23:10:23,647 INFO org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - GlobalWindowAggregate[25] -> (Calc[26] -> UnsortedAlert[27]: Writer -> UnsortedAlert[27]: Committer, Calc[390] -> UnsortedAlert[391]: Writer -> UnsortedAlert[391]: Committer) (7/20)#0 - asynchronous part of checkpoint 5450 could not be completed. java.util.concurrent.ExecutionException: org.apache.flink.runtime.checkpoint.CheckpointException: The checkpoint was aborted due to exception of other subtasks sharing the ChannelState file. at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) ~[?:?] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) ~[?:?] at org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:66) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) [flink-dist-1.17.1.jar:1.17.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: The checkpoint was aborted due to exception of other subtasks sharing the ChannelState file. at org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.fail(ChannelStateCheckpointWriter.java:298) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.failAndClearWriter(ChannelStateWriteRequestDispatcherImpl.java:212) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.handleCheckpointAbortRequest(ChannelStateWriteRequestDispatcherImpl.java:189) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.dispatchInternal(ChannelStateWriteRequestDispatcherImpl.java:129) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.dispatch(ChannelStateWriteRequestDispatcherImpl.java:94) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.loop(ChannelStateWriteRequestExecutorImpl.java:161) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.run(ChannelStateWriteRequestExecutorImpl.java:116) ~[flink-dist-1.17.1.jar:1.17.1] ... 1 more Caused by: java.util.concurrent.CancellationException: checkpoint aborted via notification at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:455) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointAborted(SubtaskCheckpointCoordinatorImpl.java:409) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointAbortAsync$17(StreamTask.java:1387) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$19(StreamTask.java:1410) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) ~[flink-dist-1.17.1.jar:1.17.1] at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) ~[flink-dist-1.17.1.jar:1.17.1] Error 4 – 2023-08-23 10:00:52,852 WARN org.apache.flink.runtime.taskmanager.Task [] - Source: Event[10] -> (MiniBatchAssigner[11] -> (Calc[12] -> (LocalWindowAggregate[13], LocalWindowAggregate[381]), Calc[22] -> LocalW indowAggregate[23], Calc[28] -> LocalWindowAggregate[29], Calc[34] -> (LocalWindowAggregate[35], LocalWindowAggregate[394]), Calc[40] -> LocalWindowAggregate[41], Calc[46] -> (LocalWindowAggregate[47], LocalWindowAggregate[401]), Calc[52] -> LocalWindowAggregate[53], Calc[58] -> (WindowTableFunction[59] -> Calc[60] -> LocalWindowAggregate[61], WindowTableFunction[408] -> Calc[409] -> LocalWindowAggregate[410]), Calc[66] -> LocalWindowAggregate[67], Calc[ 72] -> WindowTableFunction[73] -> Calc[74] -> LocalWindowAggregate[75], Calc[80] -> LocalWindowAggregate[81], Calc[86] -> LocalWindowAggregate[87], Calc[92] -> LocalWindowAggregate[93], Calc[98] -> (WindowTableFunction[99] -> Cal c[100] -> LocalWindowAggregate[101], WindowTableFunction[425] -> Calc[426] -> LocalWindowAggregate[427]), Calc[106] -> (WindowTableFunction[107] -> Calc[108] -> LocalWindowAggregate[109], WindowTableFunction[432] -> Calc[433] -> LocalWindowAggregate[434]), Calc[114] -> (LocalWindowAggregate[115], LocalWindowAggregate[439]), Calc[120] -> LocalWindowAggregate[121], Calc[126] -> LocalWindowAggregate[127], Calc[132] -> LocalWindowAggregate[133], Calc[138] -> LocalWindowAggregate[139], Calc[144] -> (LocalWindowAggregate[145], LocalWindowAggregate[452]), Calc[150] -> LocalWindowAggregate[151], Calc[156] -> LocalWindowAggregate[157], Calc[162] -> LocalWindowAggregate[163], Calc[168] -> LocalWindowAggregate[169], Calc[174] -> LocalWindowAggregate[175], Calc[180] -> LocalWindowAggregate[181], Calc[186] -> LocalWindowAggregate[187], Calc[192] -> LocalWindowAggregate[193], Calc[198] -> LocalWindowAggregate[199], C alc[204] -> (LocalWindowAggregate[205], LocalWindowAggregate[475]), Calc[210] -> (LocalWindowAggregate[211], LocalWindowAggregate[480]), Calc[216] -> LocalWindowAggregate[217], Calc[222] -> LocalWindowAggregate[223], Calc[228] -> LocalWindowAggregate[229], Calc[234] -> WindowTableFunction[235] -> Calc[236] -> LocalWindowAggregate[237], Calc[242] -> LocalWindowAggregate[243], Calc[248] -> LocalWindowAggregate[249], Calc[254] -> LocalWindowAggregate[255], Calc[260] -> LocalWindowAggregate[261], Calc[266] -> LocalWindowAggregate[267], Calc[272] -> LocalWindowAggregate[273], Calc[278] -> WindowTableFunction[279] -> Calc[280] -> LocalWindowAggregate[281], Calc[290] -> LocalWindowAggr egate[291], Calc[296] -> LocalWindowAggregate[297], Calc[302] -> LocalWindowAggregate[303], Calc[308] -> LocalWindowAggregate[309], Calc[314] -> LocalWindowAggregate[315], Calc[320] -> LocalWindowAggregate[321], Calc[326] -> Wind owTableFunction[327] -> Calc[328] -> LocalWindowAggregate[329], Calc[338] -> LocalWindowAggregate[339], Calc[348] -> LocalWindowAggregate[349], Calc[354] -> LocalWindowAggregate[355], Calc[360] -> LocalWindowAggregate[361]), Mini BatchAssigner[366] -> (Calc[367] -> Alert[368]: Writer -> Alert[368]: Committer, Calc[369] -> Alert[370]: Writer -> Alert[370]: Committer, Calc[371] -> Alert[372]: Writer -> Alert[372]: Committer, Calc[373] -> Alert[374]: Writer -> Alert[374]: Committer, Calc[375] -> Alert[376]: Writer -> Alert[376]: Committer, Calc[377] -> Alert[378]: Writer -> Alert[378]: Committer, Calc[379] -> Alert[380]: Writer -> Alert[380]: Committer)) (1/20)#0 (335c0e92a4914fc728 9f6e977222b76b_e3dfc0d7e9ecd8a43f85f0b68ebf3b80_0_0) switched from RUNNING to FAILED with failure cause: 97274 java.util.concurrent.TimeoutException: Invocation of [RemoteRpcInvocation(JobMasterOperatorEventGateway.sendOperatorEventToCoordinator(ExecutionAttemptID, OperatorID, SerializedValue))] at recipient [akka.tcp://flink@localhost:61 23/user/rpc/jobmanager_3] timed out. This is usually caused by: 1) Akka failed sending the message silently, due to problems like oversized payload or serialization failures. In that case, you should find detailed error informati on in the logs. 2) The recipient needs more time for responding, due to problems like slow machines or network jitters. In that case, you can try to increase akka.ask.timeout. 97275 at com.sun.proxy.$Proxy25.sendOperatorEventToCoordinator(Unknown Source) ~[?:?] 97276 at org.apache.flink.runtime.taskexecutor.rpc.RpcTaskOperatorEventGateway.sendOperatorEventToCoordinator(RpcTaskOperatorEventGateway.java:60) ~[flink-dist-1.17.1.jar:1.17.1] 97277 at org.apache.flink.streaming.runtime.tasks.OperatorEventDispatcherImpl$OperatorEventGatewayImpl.sendEventToCoordinator(OperatorEventDispatcherImpl.java:120) ~[flink-dist-1.17.1.jar:1.17.1] 97278 at org.apache.flink.streaming.runtime.tasks.OperatorChain.lambda$sendAcknowledgeCheckpointEvent$1(OperatorChain.java:947) ~[flink-dist-1.17.1.jar:1.17.1] 97279 at java.util.HashMap$KeySet.forEach(HashMap.java:929) ~[?:?] 97280 at org.apache.flink.streaming.runtime.tasks.OperatorChain.sendAcknowledgeCheckpointEvent(OperatorChain.java:943) ~[flink-dist-1.17.1.jar:1.17.1] 97281 at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:201) ~[flink-dist-1.17.1.jar:1.17.1] 97282 at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:715) ~[flink-dist-1.17.1.jar:1.17.1] Thanks, Neha Rawat Caution: External email. Do not click or open attachments unless you know and trust the sender. Caution: External email. Do not click or open attachments unless you know and trust the sender.