[jira] [Updated] (FLINK-35639) upgrade to 1.19 with job in HA state with restart strategy crashes job manager

2024-06-18 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-35639:
---
Summary: upgrade to 1.19 with job in HA state with restart strategy crashes 
job manager  (was: upgrading to 1.19 with job in HA state with restart strategy 
crashes job manager)

> upgrade to 1.19 with job in HA state with restart strategy crashes job manager
> --
>
> Key: FLINK-35639
> URL: https://issues.apache.org/jira/browse/FLINK-35639
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.19.1
> Environment: Download 1.18 and 1.19 binary releases. Add the 
> following to flink-1.19.0/conf/config.yaml and 
> flink-1.18.1/conf/flink-conf.yaml ```yaml high-availability: zookeeper 
> high-availability.zookeeper.quorum: localhost high-availability.storageDir: 
> file:///tmp/flink/recovery ``` Launch zookeeper: docker run --network host 
> zookeeper:latest launch 1.18 task manager: ./flink-1.18.1/bin/taskmanager.sh 
> start-foreground launch 1.18 job manager: ./flink-1.18.1/bin/jobmanager.sh 
> start-foreground launch the following job: ```java import 
> org.apache.flink.api.java.ExecutionEnvironment; import 
> org.apache.flink.api.java.tuple.Tuple2; import 
> org.apache.flink.api.common.functions.FlatMapFunction; import 
> org.apache.flink.util.Collector; import 
> org.apache.flink.api.common.restartstrategy.RestartStrategies; import 
> org.apache.flink.api.common.time.Time; import java.util.concurrent.TimeUnit; 
> public class FlinkJob \{ public static void main(String[] args) throws 
> Exception { final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment(); env.setRestartStrategy( 
> RestartStrategies.fixedDelayRestart(Integer.MAX_VALUE, Time.of(20, 
> TimeUnit.SECONDS)) ); env.fromElements("Hello World", "Hello Flink") 
> .flatMap(new LineSplitter()) .groupBy(0) .sum(1) .print(); } public static 
> final class LineSplitter implements FlatMapFunction> \{ @Override public void 
> flatMap(String value, Collector> out) { for (String word : value.split(" ")) 
> { try { Thread.sleep(12); } catch (InterruptedException e) \{ 
> e.printStackTrace(); } out.collect(new Tuple2<>(word, 1)); } } } } ``` ```xml 
> 4.0.0 org.apache.flink myflinkjob 1.0-SNAPSHOT 1.18.1 1.8 org.apache.flink 
> flink-java ${flink.version} org.apache.flink flink-streaming-java 
> ${flink.version} org.apache.maven.plugins maven-compiler-plugin 3.8.1 
> ${java.version} ${java.version} org.apache.maven.plugins maven-jar-plugin 
> 3.1.0 true lib/ FlinkJob ``` Launch job: ./flink-1.18.1/bin/flink run 
> ../flink-job/target/myflinkjob-1.0-SNAPSHOT.jar Job has been submitted with 
> JobID 5f0898c964a93a47aa480427f3e2c6c0 Kill job manager and task manager. 
> Then launch job manager 1.19.0 ./flink-1.19.0/bin/jobmanager.sh 
> start-foreground Root cause == It looks like the type of 
> delayBetweenAttemptsInterval was changed in 1.19 
> https://github.com/apache/flink/pull/22984/files#diff-d174f32ffdea69de610c4f37c545bd22a253b9846434f83397f1bbc2aaa399faR239
>  , introducing an incompatibility which is not handled by flink 1.19. In my 
> opinion, job-maanger should not crash when starting in that case. 
>Reporter: yazgoo
>Priority: Major
>
> When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in 
> zookeeper HA state, I have a jobmanager crash with a ClassCastException, see 
> log below  
>  
> {code:java}
> 2024-06-18 16:58:14,401 ERROR 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
> occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: 
> JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed.     at 
> org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484)
>  ~[flink-dist-1.19.0.jar:1.19.0]     at 
> org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775)
>  ~[flink-dist-1.19.0.jar:1.19.0]     at 
> org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738)
>  ~[flink-dist-1.19.0.jar:1.19.0]     at 
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693)
>  ~[flink-dist-1.19.0.jar:1.19.0]     at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) 
> ~[?:?]     at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
>  ~[?:?]     at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
>  ~[?:?]     at 
> org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
>  ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
> 

[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager

2024-06-18 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-35639:
---
Description: 
When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in 
zookeeper HA state, I have a jobmanager crash with a ClassCastException, see 
log below  

 
{code:java}
2024-06-18 16:58:14,401 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: 
JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed.     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) 
~[?:?]     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
 ~[?:?]     at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
 ~[?:?]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]     at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
 [?:?]     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) 
[?:?]     at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]     
at 

[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager

2024-06-18 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-35639:
---
Description: 
When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in 
zookeeper HA state, I have a jobmanager crash with a ClassCastException, see 
log below  

 
{code:java}
2024-06-18 16:58:14,401 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: 
JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed.     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) 
~[?:?]     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
 ~[?:?]     at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
 ~[?:?]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]     at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
 [?:?]     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) 
[?:?]     at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]     
at 

[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager

2024-06-18 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-35639:
---
Description: 
When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in 
zookeeper HA state, I have a jobmanager crash with a ClassCastException, see 
log below  

 
{code:java}
2024-06-18 16:58:14,401 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: 
JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed.     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) 
~[?:?]     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
 ~[?:?]     at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
 ~[?:?]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]     at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
 [?:?]     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) 
[?:?]     at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]     
at 

[jira] [Updated] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager

2024-06-18 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-35639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-35639:
---
Description: 
When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in 
zookeeper HA state, I have a ClassCastException, see log below  

 
{code:java}
2024-06-18 16:58:14,401 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint. org.apache.flink.util.FlinkException: 
JobMaster for job 5f0898c964a93a47aa480427f3e2c6c0 failed.     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) 
~[?:?]     at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
 ~[?:?]     at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
 ~[?:?]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-dist-1.19.0.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleMessage(PekkoRpcActor.java:168)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:33) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse(PartialFunction.scala:127) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction.applyOrElse$(PartialFunction.scala:126) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:29) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive(Actor.scala:547) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.Actor.aroundReceive$(Actor.scala:545) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.AbstractActor.aroundReceive(AbstractActor.scala:229) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.receiveMessage(ActorCell.scala:590) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.actor.ActorCell.invoke(ActorCell.scala:557) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.processMailbox(Mailbox.scala:280) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.run(Mailbox.scala:241) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
org.apache.pekko.dispatch.Mailbox.exec(Mailbox.scala:253) 
[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]     at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) [?:?]     at 
java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
 [?:?]     at java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) 
[?:?]     at 
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) [?:?]     
at 

[jira] [Created] (FLINK-35639) upgrading to 1.19 with job in HA state with restart strategy crashes job manager

2024-06-18 Thread yazgoo (Jira)
yazgoo created FLINK-35639:
--

 Summary: upgrading to 1.19 with job in HA state with restart 
strategy crashes job manager
 Key: FLINK-35639
 URL: https://issues.apache.org/jira/browse/FLINK-35639
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.19.1
 Environment: Download 1.18 and 1.19 binary releases. Add the following 
to flink-1.19.0/conf/config.yaml and flink-1.18.1/conf/flink-conf.yaml ```yaml 
high-availability: zookeeper high-availability.zookeeper.quorum: localhost 
high-availability.storageDir: file:///tmp/flink/recovery ``` Launch zookeeper: 
docker run --network host zookeeper:latest launch 1.18 task manager: 
./flink-1.18.1/bin/taskmanager.sh start-foreground launch 1.18 job manager: 
./flink-1.18.1/bin/jobmanager.sh start-foreground launch the following job: 
```java import org.apache.flink.api.java.ExecutionEnvironment; import 
org.apache.flink.api.java.tuple.Tuple2; import 
org.apache.flink.api.common.functions.FlatMapFunction; import 
org.apache.flink.util.Collector; import 
org.apache.flink.api.common.restartstrategy.RestartStrategies; import 
org.apache.flink.api.common.time.Time; import java.util.concurrent.TimeUnit; 
public class FlinkJob \{ public static void main(String[] args) throws 
Exception { final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment(); env.setRestartStrategy( 
RestartStrategies.fixedDelayRestart(Integer.MAX_VALUE, Time.of(20, 
TimeUnit.SECONDS)) ); env.fromElements("Hello World", "Hello Flink") 
.flatMap(new LineSplitter()) .groupBy(0) .sum(1) .print(); } public static 
final class LineSplitter implements FlatMapFunction> \{ @Override public void 
flatMap(String value, Collector> out) { for (String word : value.split(" ")) { 
try { Thread.sleep(12); } catch (InterruptedException e) \{ 
e.printStackTrace(); } out.collect(new Tuple2<>(word, 1)); } } } } ``` ```xml 
4.0.0 org.apache.flink myflinkjob 1.0-SNAPSHOT 1.18.1 1.8 org.apache.flink 
flink-java ${flink.version} org.apache.flink flink-streaming-java 
${flink.version} org.apache.maven.plugins maven-compiler-plugin 3.8.1 
${java.version} ${java.version} org.apache.maven.plugins maven-jar-plugin 3.1.0 
true lib/ FlinkJob ``` Launch job: ./flink-1.18.1/bin/flink run 
../flink-job/target/myflinkjob-1.0-SNAPSHOT.jar Job has been submitted with 
JobID 5f0898c964a93a47aa480427f3e2c6c0 Kill job manager and task manager. Then 
launch job manager 1.19.0 ./flink-1.19.0/bin/jobmanager.sh start-foreground 
Root cause == It looks like the type of delayBetweenAttemptsInterval 
was changed in 1.19 
https://github.com/apache/flink/pull/22984/files#diff-d174f32ffdea69de610c4f37c545bd22a253b9846434f83397f1bbc2aaa399faR239
 , introducing an incompatibility which is not handled by flink 1.19. In my 
opinion, job-maanger should not crash when starting in that case. 
Reporter: yazgoo


When trying to upgrade a flink cluster from 1.18 to 1.19, with a 1.18 job in 
zookeeper HA state, I have a ClassCastException, see log below  

```log
2024-06-18 16:58:14,401 ERROR 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error 
occurred in the cluster entrypoint.
org.apache.flink.util.FlinkException: JobMaster for job 
5f0898c964a93a47aa480427f3e2c6c0 failed.
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:1484)
 ~[flink-dist-1.19.0.jar:1.19.0]
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.jobManagerRunnerFailed(Dispatcher.java:775)
 ~[flink-dist-1.19.0.jar:1.19.0]
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.handleJobManagerRunnerResult(Dispatcher.java:738)
 ~[flink-dist-1.19.0.jar:1.19.0]
    at 
org.apache.flink.runtime.dispatcher.Dispatcher.lambda$runJob$7(Dispatcher.java:693)
 ~[flink-dist-1.19.0.jar:1.19.0]
    at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) 
~[?:?]
    at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911)
 ~[?:?]
    at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
 ~[?:?]
    at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.lambda$handleRunAsync$4(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]
    at 
org.apache.flink.runtime.concurrent.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
 ~[flink-dist-1.19.0.jar:1.19.0]
    at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRunAsync(PekkoRpcActor.java:451)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]
    at 
org.apache.flink.runtime.rpc.pekko.PekkoRpcActor.handleRpcMessage(PekkoRpcActor.java:218)
 ~[flink-rpc-akka84eb9e64-a1ce-450c-ad53-d9fa579b67e1.jar:1.19.0]
    at 
org.apache.flink.runtime.rpc.pekko.FencedPekkoRpcActor.handleRpcMessage(FencedPekkoRpcActor.java:85)
 

[jira] [Commented] (FLINK-35138) Release flink-connector-kafka v3.2.0 for Flink 1.19

2024-06-08 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853367#comment-17853367
 ] 

yazgoo commented on FLINK-35138:


Awesome !
Thanks for your work !

> Release flink-connector-kafka v3.2.0 for Flink 1.19
> ---
>
> Key: FLINK-35138
> URL: https://issues.apache.org/jira/browse/FLINK-35138
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-3.2.0
>
>
> https://github.com/apache/flink-connector-kafka



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35138) Release flink-connector-kafka v3.2.0 for Flink 1.19

2024-06-05 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852301#comment-17852301
 ] 

yazgoo commented on FLINK-35138:


Hi,

Looks like there are still 2 binding votes missing
Do you have an update on this ?

Thanks !

> Release flink-connector-kafka v3.2.0 for Flink 1.19
> ---
>
> Key: FLINK-35138
> URL: https://issues.apache.org/jira/browse/FLINK-35138
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-3.2.0
>
>
> https://github.com/apache/flink-connector-kafka



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35138) Release flink-connector-kafka v3.2.0 for Flink 1.19

2024-05-13 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845909#comment-17845909
 ] 

yazgoo commented on FLINK-35138:


Hello, do you have an update on this ticket ?
Looks like the vote was fine ?

Thanks !

> Release flink-connector-kafka v3.2.0 for Flink 1.19
> ---
>
> Key: FLINK-35138
> URL: https://issues.apache.org/jira/browse/FLINK-35138
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Kafka
>Reporter: Danny Cranmer
>Assignee: Danny Cranmer
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-3.2.0
>
>
> https://github.com/apache/flink-connector-kafka



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-35007) Update Flink Kafka connector to support 1.19

2024-04-23 Thread yazgoo (Jira)


[ https://issues.apache.org/jira/browse/FLINK-35007 ]


yazgoo deleted comment on FLINK-35007:


was (Author: yazgoo):
Hi,

Do you plan on publishing flink-connector-kafka:3.1.0-1.19 ?

Thanks !

> Update Flink Kafka connector to support 1.19
> 
>
> Key: FLINK-35007
> URL: https://issues.apache.org/jira/browse/FLINK-35007
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kafka
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-4.0.0, kafka-3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (FLINK-35007) Update Flink Kafka connector to support 1.19

2024-04-23 Thread yazgoo (Jira)


[ https://issues.apache.org/jira/browse/FLINK-35007 ]


yazgoo deleted comment on FLINK-35007:


was (Author: yazgoo):
Hi [~martijnvisser] , do you have news on my previous question ?

> Update Flink Kafka connector to support 1.19
> 
>
> Key: FLINK-35007
> URL: https://issues.apache.org/jira/browse/FLINK-35007
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kafka
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-4.0.0, kafka-3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35007) Update Flink Kafka connector to support 1.19

2024-04-23 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839963#comment-17839963
 ] 

yazgoo commented on FLINK-35007:


Hi [~martijnvisser] , do you have news on my previous question ?

> Update Flink Kafka connector to support 1.19
> 
>
> Key: FLINK-35007
> URL: https://issues.apache.org/jira/browse/FLINK-35007
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kafka
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-4.0.0, kafka-3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-35007) Update Flink Kafka connector to support 1.19

2024-04-18 Thread yazgoo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-35007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838630#comment-17838630
 ] 

yazgoo commented on FLINK-35007:


Hi,

Do you plan on publishing flink-connector-kafka:3.1.0-1.19 ?

Thanks !

> Update Flink Kafka connector to support 1.19
> 
>
> Key: FLINK-35007
> URL: https://issues.apache.org/jira/browse/FLINK-35007
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Connectors / Kafka
>Reporter: Martijn Visser
>Assignee: Martijn Visser
>Priority: Major
>  Labels: pull-request-available
> Fix For: kafka-4.0.0, kafka-3.1.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (FLINK-20560) history-server: update archives while they're being loaded

2021-03-24 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-20560:
---
Description: 
_emphasized text_When history server unpack and loads archives, we have to wait 
for all the archive to be read before updating the history server view.
If there are a lot of archives to handle, this can amount to a significant time 
with the history ui not up to date, when in fact it could show newly loaded 
jobs.

Attached is a diff of a fix for this [^b.diff]

  was:
When history server unpack and loads archives, we have to wait for all the 
archive to be read before updating the history server view.
If there are a lot of archives to handle, this can amount to a significant time 
with the history ui not up to date, when in fact it could show newly loaded 
jobs.

Attached is a diff of a fix for this [^b.diff]


> history-server: update archives while they're being loaded
> --
>
> Key: FLINK-20560
> URL: https://issues.apache.org/jira/browse/FLINK-20560
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Web Frontend
>Reporter: yazgoo
>Priority: Major
> Fix For: 1.11.2
>
> Attachments: b.diff
>
>
> _emphasized text_When history server unpack and loads archives, we have to 
> wait for all the archive to be read before updating the history server view.
> If there are a lot of archives to handle, this can amount to a significant 
> time with the history ui not up to date, when in fact it could show newly 
> loaded jobs.
> Attached is a diff of a fix for this [^b.diff]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20560) history-server: update archives while they're being loaded

2020-12-10 Thread yazgoo (Jira)
yazgoo created FLINK-20560:
--

 Summary: history-server: update archives while they're being loaded
 Key: FLINK-20560
 URL: https://issues.apache.org/jira/browse/FLINK-20560
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Reporter: yazgoo
 Fix For: 1.11.2
 Attachments: b.diff

When history server unpack and loads archives, we have to wait for all the 
archive to be read before updating the history server view.
If there are a lot of archives to handle, this can amount to a significant time 
with the history ui not up to date, when in fact it could show newly loaded 
jobs.

Attached is a diff of a fix for this [^b.diff]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20559) history-server: don't add caching headers for /jobs/overview.json

2020-12-10 Thread yazgoo (Jira)
yazgoo created FLINK-20559:
--

 Summary: history-server: don't add caching headers for 
/jobs/overview.json 
 Key: FLINK-20559
 URL: https://issues.apache.org/jira/browse/FLINK-20559
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Web Frontend
Affects Versions: 1.11.2
Reporter: yazgoo
 Attachments: a.diff

History server returns 
Cache-Control: max-age=300 for `/jobs/overview` path.

It should not cache this because if a new jobs get added it is not up to date.

Looking at the source code, looks like an exception was added for 
joboverview.json, but since the file was renamed, the exception was not updated.

 

Attached is a diff of a fix for handling this: [^a.diff]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-16221) Execution::transitionState() should log an error when error parameter is not null

2020-02-21 Thread yazgoo (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-16221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yazgoo updated FLINK-16221:
---
Description: 
When execution state transitions with an error, an INFO is logged n 
Execution::transitionState().
 I think an ERROR should be logged.
 This is especially usefull when states transitions to failing, to be able to 
retrieve the error causing the failure.
 So:
|LOG.*info*("{} ({}) switched from {} to {}.", 
getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
targetState, error);|

should become
|LOG.*error*("{} ({}) switched from {} to {}.", 
getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
targetState, error);|

  was:
When execution state transitions with an error, an INFO is logged n 
Execution::transitionState().
I think an ERROR should be logged.
This is especially usefull when states transitions to failing, to be able to 
retrieve the error causing the failure.
So:
|LOG.error("{} ({}) switched from {} to {}.", 
getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
targetState, error);|


should become
|LOG.info("{} ({}) switched from {} to {}.", 
getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
targetState, error);|


> Execution::transitionState() should log an error when error parameter is not 
> null 
> --
>
> Key: FLINK-16221
> URL: https://issues.apache.org/jira/browse/FLINK-16221
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.2, 1.10.0
>Reporter: yazgoo
>Priority: Major
> Attachments: info_to_error.patch
>
>
> When execution state transitions with an error, an INFO is logged n 
> Execution::transitionState().
>  I think an ERROR should be logged.
>  This is especially usefull when states transitions to failing, to be able to 
> retrieve the error causing the failure.
>  So:
> |LOG.*info*("{} ({}) switched from {} to {}.", 
> getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
> targetState, error);|
> should become
> |LOG.*error*("{} ({}) switched from {} to {}.", 
> getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
> targetState, error);|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16221) Execution::transitionState() should log an error when error parameter is not null

2020-02-21 Thread yazgoo (Jira)
yazgoo created FLINK-16221:
--

 Summary: Execution::transitionState() should log an error when 
error parameter is not null 
 Key: FLINK-16221
 URL: https://issues.apache.org/jira/browse/FLINK-16221
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.10.0, 1.9.2
Reporter: yazgoo
 Attachments: info_to_error.patch

When execution state transitions with an error, an INFO is logged n 
Execution::transitionState().
I think an ERROR should be logged.
This is especially usefull when states transitions to failing, to be able to 
retrieve the error causing the failure.
So:
|LOG.error("{} ({}) switched from {} to {}.", 
getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
targetState, error);|


should become
|LOG.info("{} ({}) switched from {} to {}.", 
getVertex().getTaskNameWithSubtaskIndex(), getAttemptId(), currentState, 
targetState, error);|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)