[jira] [Updated] (FLINK-35639) upgrade to 1.19 with job in HA state with restart strategy crashes job manager
[ https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-35639: --- Summary: upgrade to 1.19 with job in HA state with restart strategy crashes job manager (was: upgrading to 1.19 with job in HA state with restart strategy crashes job manager) > upgrade to 1.19 with job in HA state with restart strategy crashes job manager > -- > > Key: FLINK-35639 > URL: https://issues.apache.org/jira/browse/FLINK-35639 > Project: Flink > Issue Type: Bug > Components: API / Core >Affects Versions: 1.19.1 > Environment: Download 1.18 and 1.19 binary releases. Add the > following to flink-1.19.0/conf/config.yaml and > flink-1.18.1/conf/flink-conf.yaml ```yaml high-availability: zookeeper > high-availability.zookeeper.quorum: localhost high-availability.storageDir: > file:///tmp/flink/recovery ``` Launch zookeeper: docker run --network host > zookeeper:latest launch 1.18 task manager: ./flink-1.18.1/bin/taskmanager.sh > start-foreground launch 1.18 job manager: ./flink-1.18.1/bin/jobmanager.sh > start-foreground launch the following job: ```java import > org.apache.flink.api.java.ExecutionEnvironment; import > org.apache.flink.api.java.tuple.Tuple2; import > org.apache.flink.api.common.functions.FlatMapFunction; import > org.apache.flink.util.Collector; import > org.apache.flink.api.common.restartstrategy.RestartStrategies; import > org.apache.flink.api.common.time.Time; import java.util.concurrent.TimeUnit; > public class FlinkJob \{ public static void main(String[] args) throws > Exception { final ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); env.setRestartStrategy( > RestartStrategies.fixedDelayRestart(Integer.MAX_VALUE, Time.of(20, > TimeUnit.SECONDS)) ); env.fromElements("Hello World", "Hello Flink") > .flatMap(new LineSplitter()) .groupBy(0) .sum(1) .print(); } public static > final class LineSplitter implements FlatMapFunction> \{ @Override public void > flatMap(String value, Collector> out) { for (String word : value.split(" ")) > { try { Thread.sleep(12); } catch (InterruptedException e) \{ > e.printStackTrace(); } out.collect(new Tuple2<>(word, 1)); } } } } ``` ```xml > 4.0.0 org.apache.flink myflinkjob 1.0-SNAPSHOT 1.18.1 1.8 org.apache.flink > flink-java ${flink.version} org.apache.flink flink-streaming-java > ${flink.version} org.apache.maven.plugins maven-compiler-plugin 3.8.1 > ${java.version} ${java.version} org.apache.maven.plugins maven-jar-plugin > 3.1.0 true lib/ FlinkJob ``` Launch job: ./flink-1.18.1/bin/flink run > ../flink-job/target/myflinkjob-1.0-SNAPSHOT.jar Job has been submitted with > JobID 5f0898c964a93a47aa480427f3e2c6c0 Kill job manager and task manager. > Then launch job manager 1.19.0 ./flink-1.19.0/bin/jobmanager.sh > start-foreground Root cause == It looks like the type of > delayBetweenAttemptsInterval was changed in 1.19 > https://github.com/apache/flink/pull/22984/files#diff-d174f32ffdea69de610c4f37c545bd22a253b9846434f83397f1bbc2aaa399faR239 > , introducing an incompatibility which is not handled by flink 1.19. In my > opinion, job-maanger should not crash when starting in that case. >Reporter: yazgoo >Priority: Major > > When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in > zookeeper HA state, I have a jobmanager crash with a ClassCastException, see > log below > > {code:java} > 2024-06-18 16:58:14,401 ERROR > org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error > occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: > JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at > org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) > ~[flink-dist-1.19.0.jar:1.19.0] at > org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) > ~[flink-dist-1.19.0.jar:1.19.0] at > org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) > ~[flink-dist-1.19.0.jar:1.19.0] at > org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) > ~[flink-dist-1.19.0.jar:1.19.0] at > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) > ~[?:?] at > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) > ~[?:?] at > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) > ~[?:?] at > org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) > ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at >
[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager
[ https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-35639: --- Description: When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in zookeeper HA state, I have a jobmanager crash with a ClassCastException, see log below {code:java} 2024-06-18 16:58:14,401 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) ~[flink-dist-1.19.0.jar:1.19.0] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?] at
[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager
[ https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-35639: --- Description: When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in zookeeper HA state, I have a jobmanager crash with a ClassCastException, see log below {code:java} 2024-06-18 16:58:14,401 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) ~[flink-dist-1.19.0.jar:1.19.0] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?] at
[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager
[ https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-35639: --- Description: When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in zookeeper HA state, I have a jobmanager crash with a ClassCastException, see log below {code:java} 2024-06-18 16:58:14,401 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) ~[flink-dist-1.19.0.jar:1.19.0] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?] at
[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager
[ https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-35639: --- Description: When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in zookeeper HA state, I have a ClassCastException, see log below {code:java} 2024-06-18 16:58:14,401 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) ~[flink-dist-1.19.0.jar:1.19.0] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse(PartialFunction.scala:127) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) [flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?] at java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) [?:?] at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) [?:?] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?] at
[jira] [Created] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager
yazgoo created FLINK-35639: -- Summary: upgrading to 1.19 with job in HA state with restart strategy crashes job manager Key: FLINK-35639 URL: https://issues.apache.org/jira/browse/FLINK-35639 Project: Flink Issue Type: Bug Components: API / Core Affects Versions: 1.19.1 Environment: Download 1.18 and 1.19 binary releases. Add the following to flink-1.19.0/conf/config.yaml and flink-1.18.1/conf/flink-conf.yaml ```yaml high-availability: zookeeper high-availability.zookeeper.quorum: localhost high-availability.storageDir: file:///tmp/flink/recovery ``` Launch zookeeper: docker run --network host zookeeper:latest launch 1.18 task manager: ./flink-1.18.1/bin/taskmanager.sh start-foreground launch 1.18 job manager: ./flink-1.18.1/bin/jobmanager.sh start-foreground launch the following job: ```java import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.util.Collector; import org.apache.flink.api.common.restartstrategy.RestartStrategies; import org.apache.flink.api.common.time.Time; import java.util.concurrent.TimeUnit; public class FlinkJob \{ public static void main(String[] args) throws Exception { final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); env.setRestartStrategy( RestartStrategies.fixedDelayRestart(Integer.MAX_VALUE, Time.of(20, TimeUnit.SECONDS)) ); env.fromElements("Hello World", "Hello Flink") .flatMap(new LineSplitter()) .groupBy(0) .sum(1) .print(); } public static final class LineSplitter implements FlatMapFunction> \{ @Override public void flatMap(String value, Collector> out) { for (String word : value.split(" ")) { try { Thread.sleep(12); } catch (InterruptedException e) \{ e.printStackTrace(); } out.collect(new Tuple2<>(word, 1)); } } } } ``` ```xml 4.0.0 org.apache.flink myflinkjob 1.0-SNAPSHOT 1.18.1 1.8 org.apache.flink flink-java ${flink.version} org.apache.flink flink-streaming-java ${flink.version} org.apache.maven.plugins maven-compiler-plugin 3.8.1 ${java.version} ${java.version} org.apache.maven.plugins maven-jar-plugin 3.1.0 true lib/ FlinkJob ``` Launch job: ./flink-1.18.1/bin/flink run ../flink-job/target/myflinkjob-1.0-SNAPSHOT.jar Job has been submitted with JobID 5f0898c964a93a47aa480427f3e2c6c0 Kill job manager and task manager. Then launch job manager 1.19.0 ./flink-1.19.0/bin/jobmanager.sh start-foreground Root cause == It looks like the type of delayBetweenAttemptsInterval was changed in 1.19 https://github.com/apache/flink/pull/22984/files#diff-d174f32ffdea69de610c4f37c545bd22a253b9846434f83397f1bbc2aaa399faR239 , introducing an incompatibility which is not handled by flink 1.19. In my opinion, job-maanger should not crash when starting in that case. Reporter: yazgoo When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in zookeeper HA state, I have a ClassCastException, see log below ```log 2024-06-18 16:58:14,401 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed. at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693) ~[flink-dist-1.19.0.jar:1.19.0] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68) ~[flink-dist-1.19.0.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218) ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0] at org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
[jira] [Commented] (FLINK-35138) Release flink-connector-kafka v3.2.0 for Flink 1.19
[ https://issues.apache.org/jira/browse/FLINK-35138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853367#comment-17853367 ] yazgoo commented on FLINK-35138: Awesome ! Thanks for your work ! > Release flink-connector-kafka v3.2.0 for Flink 1.19 > --- > > Key: FLINK-35138 > URL: https://issues.apache.org/jira/browse/FLINK-35138 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Danny Cranmer >Assignee: Danny Cranmer >Priority: Major > Labels: pull-request-available > Fix For: kafka-3.2.0 > > > https://github.com/apache/flink-connector-kafka -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35138) Release flink-connector-kafka v3.2.0 for Flink 1.19
[ https://issues.apache.org/jira/browse/FLINK-35138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852301#comment-17852301 ] yazgoo commented on FLINK-35138: Hi, Looks like there are still 2 binding votes missing Do you have an update on this ? Thanks ! > Release flink-connector-kafka v3.2.0 for Flink 1.19 > --- > > Key: FLINK-35138 > URL: https://issues.apache.org/jira/browse/FLINK-35138 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Danny Cranmer >Assignee: Danny Cranmer >Priority: Major > Labels: pull-request-available > Fix For: kafka-3.2.0 > > > https://github.com/apache/flink-connector-kafka -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35138) Release flink-connector-kafka v3.2.0 for Flink 1.19
[ https://issues.apache.org/jira/browse/FLINK-35138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845909#comment-17845909 ] yazgoo commented on FLINK-35138: Hello, do you have an update on this ticket ? Looks like the vote was fine ? Thanks ! > Release flink-connector-kafka v3.2.0 for Flink 1.19 > --- > > Key: FLINK-35138 > URL: https://issues.apache.org/jira/browse/FLINK-35138 > Project: Flink > Issue Type: Sub-task > Components: Connectors / Kafka >Reporter: Danny Cranmer >Assignee: Danny Cranmer >Priority: Major > Labels: pull-request-available > Fix For: kafka-3.2.0 > > > https://github.com/apache/flink-connector-kafka -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (FLINK-35007) Update Flink Kafka connector to support 1.19
[ https://issues.apache.org/jira/browse/FLINK-35007 ] yazgoo deleted comment on FLINK-35007: was (Author: yazgoo): Hi, Do you plan on publishing flink-connector-kafka:3.1.0-1.19 ? Thanks ! > Update Flink Kafka connector to support 1.19 > > > Key: FLINK-35007 > URL: https://issues.apache.org/jira/browse/FLINK-35007 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / Kafka >Reporter: Martijn Visser >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: kafka-4.0.0, kafka-3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (FLINK-35007) Update Flink Kafka connector to support 1.19
[ https://issues.apache.org/jira/browse/FLINK-35007 ] yazgoo deleted comment on FLINK-35007: was (Author: yazgoo): Hi [~martijnvisser] , do you have news on my previous question ? > Update Flink Kafka connector to support 1.19 > > > Key: FLINK-35007 > URL: https://issues.apache.org/jira/browse/FLINK-35007 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / Kafka >Reporter: Martijn Visser >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: kafka-4.0.0, kafka-3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35007) Update Flink Kafka connector to support 1.19
[ https://issues.apache.org/jira/browse/FLINK-35007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839963#comment-17839963 ] yazgoo commented on FLINK-35007: Hi [~martijnvisser] , do you have news on my previous question ? > Update Flink Kafka connector to support 1.19 > > > Key: FLINK-35007 > URL: https://issues.apache.org/jira/browse/FLINK-35007 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / Kafka >Reporter: Martijn Visser >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: kafka-4.0.0, kafka-3.2.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35007) Update Flink Kafka connector to support 1.19
[ https://issues.apache.org/jira/browse/FLINK-35007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838630#comment-17838630 ] yazgoo commented on FLINK-35007: Hi, Do you plan on publishing flink-connector-kafka:3.1.0-1.19 ? Thanks ! > Update Flink Kafka connector to support 1.19 > > > Key: FLINK-35007 > URL: https://issues.apache.org/jira/browse/FLINK-35007 > Project: Flink > Issue Type: Technical Debt > Components: Connectors / Kafka >Reporter: Martijn Visser >Assignee: Martijn Visser >Priority: Major > Labels: pull-request-available > Fix For: kafka-4.0.0, kafka-3.1.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-20560) history-server: update archives while they're being loaded
[ https://issues.apache.org/jira/browse/FLINK-20560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-20560: --- Description: _emphasized text_When history server unpack and loads archives, we have to wait for all the archive to be read before updating the history server view. If there are a lot of archives to handle, this can amount to a significant time with the history ui not up to date, when in fact it could show newly loaded jobs. Attached is a diff of a fix for this [^b.diff] was: When history server unpack and loads archives, we have to wait for all the archive to be read before updating the history server view. If there are a lot of archives to handle, this can amount to a significant time with the history ui not up to date, when in fact it could show newly loaded jobs. Attached is a diff of a fix for this [^b.diff] > history-server: update archives while they're being loaded > -- > > Key: FLINK-20560 > URL: https://issues.apache.org/jira/browse/FLINK-20560 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Reporter: yazgoo >Priority: Major > Fix For: 1.11.2 > > Attachments: b.diff > > > _emphasized text_When history server unpack and loads archives, we have to > wait for all the archive to be read before updating the history server view. > If there are a lot of archives to handle, this can amount to a significant > time with the history ui not up to date, when in fact it could show newly > loaded jobs. > Attached is a diff of a fix for this [^b.diff] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20560) history-server: update archives while they're being loaded
yazgoo created FLINK-20560: -- Summary: history-server: update archives while they're being loaded Key: FLINK-20560 URL: https://issues.apache.org/jira/browse/FLINK-20560 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Reporter: yazgoo Fix For: 1.11.2 Attachments: b.diff When history server unpack and loads archives, we have to wait for all the archive to be read before updating the history server view. If there are a lot of archives to handle, this can amount to a significant time with the history ui not up to date, when in fact it could show newly loaded jobs. Attached is a diff of a fix for this [^b.diff] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20559) history-server: don't add caching headers for /jobs/overview.json
yazgoo created FLINK-20559: -- Summary: history-server: don't add caching headers for /jobs/overview.json Key: FLINK-20559 URL: https://issues.apache.org/jira/browse/FLINK-20559 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.11.2 Reporter: yazgoo Attachments: a.diff History server returns Cache-Control: max-age=300 for `/jobs/overview` path. It should not cache this because if a new jobs get added it is not up to date. Looking at the source code, looks like an exception was added for joboverview.json, but since the file was renamed, the exception was not updated. Attached is a diff of a fix for handling this: [^a.diff] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-16221) Execution::transitionState() should log an error when error parameter is not null
[ https://issues.apache.org/jira/browse/FLINK-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yazgoo updated FLINK-16221: --- Description: When execution state transitions with an error, an INFO is logged n Execution::transitionState(). I think an ERROR should be logged. This is especially usefull when states transitions to failing, to be able to retrieve the error causing the failure. So: |LOG.*info*("{} ({}) switched from {} to {}.", getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, targetState, error);| should become |LOG.*error*("{} ({}) switched from {} to {}.", getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, targetState, error);| was: When execution state transitions with an error, an INFO is logged n Execution::transitionState(). I think an ERROR should be logged. This is especially usefull when states transitions to failing, to be able to retrieve the error causing the failure. So: |LOG.error("{} ({}) switched from {} to {}.", getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, targetState, error);| should become |LOG.info("{} ({}) switched from {} to {}.", getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, targetState, error);| > Execution::transitionState() should log an error when error parameter is not > null > -- > > Key: FLINK-16221 > URL: https://issues.apache.org/jira/browse/FLINK-16221 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.9.2, 1.10.0 >Reporter: yazgoo >Priority: Major > Attachments: info_to_error.patch > > > When execution state transitions with an error, an INFO is logged n > Execution::transitionState(). > I think an ERROR should be logged. > This is especially usefull when states transitions to failing, to be able to > retrieve the error causing the failure. > So: > |LOG.*info*("{} ({}) switched from {} to {}.", > getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, > targetState, error);| > should become > |LOG.*error*("{} ({}) switched from {} to {}.", > getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, > targetState, error);| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16221) Execution::transitionState() should log an error when error parameter is not null
yazgoo created FLINK-16221: -- Summary: Execution::transitionState() should log an error when error parameter is not null Key: FLINK-16221 URL: https://issues.apache.org/jira/browse/FLINK-16221 Project: Flink Issue Type: Improvement Components: Runtime / Coordination Affects Versions: 1.10.0, 1.9.2 Reporter: yazgoo Attachments: info_to_error.patch When execution state transitions with an error, an INFO is logged n Execution::transitionState(). I think an ERROR should be logged. This is especially usefull when states transitions to failing, to be able to retrieve the error causing the failure. So: |LOG.error("{} ({}) switched from {} to {}.", getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, targetState, error);| should become |LOG.info("{} ({}) switched from {} to {}.", getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, targetState, error);| -- This message was sent by Atlassian Jira (v8.3.4#803005)