Hi All,

We have flink 2.2.0 installed in a 9 VM cluster, 3 jobmanager, 3 taskmanagers and 3 zookeepers. We have a roound robin DNS vip name pointing to the 3 jobmanagers.

When we navigate to the VIP address, https://flink.acme.com:8180 the jobmanager UI comes up as expected and everything looks great.

The problem comes when trying to view taskmanager logs, if the VIP address hasn't taken us to the jobmanager leader, the taskmanager log shows empty and we're seeing errors in the log file on the jobmanager (Error below).

My understanding is that in an HA environment, no matter which jobmanager you connect to, the actual information you see is coming from the jobmanager leader.

I have attached the config files for one jobmanager and taskmanager, all jobmanagers have the same config aside from where a hostname is referenced, likewise all taskmanager configs are the same aside from where a hostname is referenced.

Please let me know if i'm misunderstanding how the cluster should operate in ha mode or if I have some issue in the configs.


------
Thanks,
Thomas


JOB MANAGER ERROR:

2026-02-10 13:18:04,820 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler [] - Failed to transfer file from TaskExecutor host0002705.acme.com:44825-6d61
16.
java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store. at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:138) ~[flink-dist-2.2.0.jar:2.
2.0]
at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) [?:?] at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) [?:?] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [flink-dist-2.2.0.jar:2.2.0]
       at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store.
       ... 10 more
Caused by: java.io.FileNotFoundException: Local file /acme/flink/flink-tmp/jm_b3d389c598b8d1f845e41256585ecd70/blobStorage/no_job/blob_t-25c24c390ec7f6acb195bcf8fade1235e661ce1b-f6a0ee 50aa3cbf99b91fb309d8e164ef does not exist and failed to copy from blob store. at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:563) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.blob.BlobServer.getFileInternalWithReadLock(BlobServer.java:492) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java:434) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:136) ~[flink-dist-2.2.0.jar:2.
2.0]
       ... 9 more
2026-02-10 13:18:04,822 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler [] - Unhandled exception. org.apache.flink.util.FlinkException: Failed to transfer file from TaskExecutor host0002705.acme.com:44825-6d6116. at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.handleException(AbstractTaskManagerFileHandler.java:223) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$1(AbstractTaskManagerFileHandler.java:158) ~[flink-dist-2.2.0.jar:2.
2.0]
at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) [?:?] at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) [?:?] at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) [?:?] at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:614) [?:?] at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:726) [?:?] at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) [?:?] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [flink-dist-2.2.0.jar:2.2.0]
       at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store. at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:138) ~[flink-dist-2.2.0.jar:2.
2.0]
at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
       ... 8 more
Caused by: java.io.FileNotFoundException: Local file /acme/flink/flink-tmp/jm_b3d389c598b8d1f845e41256585ecd70/blobStorage/no_job/blob_t-25c24c390ec7f6acb195bcf8fade1235e661ce1b-f6a0ee 50aa3cbf99b91fb309d8e164ef does not exist and failed to copy from blob store. at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:563) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.blob.BlobServer.getFileInternalWithReadLock(BlobServer.java:492) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java:434) ~[flink-dist-2.2.0.jar:2.2.0] at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:136) ~[flink-dist-2.2.0.jar:2.
2.0]
at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[?:?]
       ... 8 more

Attachment: jobmanager.config.yaml
Description: application/yaml

Attachment: taskmanager.config.yaml
Description: application/yaml

Reply via email to