[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-20 Thread Aljoscha Krettek (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-18637:
-
Component/s: Runtime / State Backends

> Key group is not in KeyGroupRange
> -
>
> Key: FLINK-18637
> URL: https://issues.apache.org/jira/browse/FLINK-18637
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
> Environment: Version: 1.10.0, Rev:, Date:
> OS current user: yarn
>  Current Hadoop/Kerberos user: hadoop
>  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
>  Maximum heap size: 28960 MiBytes
>  JAVA_HOME: /usr/java/jdk1.8.0_141/jre
>  Hadoop version: 2.8.5-amzn-6
>  JVM Options:
>  -Xmx30360049728
>  -Xms30360049728
>  -XX:MaxDirectMemorySize=4429185024
>  -XX:MaxMetaspaceSize=1073741824
>  -XX:+UseG1GC
>  -XX:+UnlockDiagnosticVMOptions
>  -XX:+G1SummarizeConcMark
>  -verbose:gc
>  -XX:+PrintGCDetails
>  -XX:+PrintGCDateStamps
>  -XX:+UnlockCommercialFeatures
>  -XX:+FlightRecorder
>  -XX:+DebugNonSafepoints
>  
> -XX:FlightRecorderOptions=defaultrecording=true,settings=/home/hadoop/heap.jfc,dumponexit=true,dumponexitpath=/var/lib/hadoop-yarn/recording.jfr,loglevel=info
>  
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1593935560662_0002/container_1593935560662_0002_01_02/taskmanager.log
>  -Dlog4j.configuration=[file:./log4j.properties|file:///log4j.properties]
>  Program Arguments:
>  -Dtaskmanager.memory.framework.off-heap.size=134217728b
>  -Dtaskmanager.memory.network.max=1073741824b
>  -Dtaskmanager.memory.network.min=1073741824b
>  -Dtaskmanager.memory.framework.heap.size=134217728b
>  -Dtaskmanager.memory.managed.size=23192823744b
>  -Dtaskmanager.cpu.cores=7.0
>  -Dtaskmanager.memory.task.heap.size=30225832000b
>  -Dtaskmanager.memory.task.off-heap.size=3221225472b
>  --configDir.
>  
> -Djobmanager.rpc.address=ip-10-180-30-250.us-west-2.compute.internal-Dweb.port=0
>  -Dweb.tmpdir=/tmp/flink-web-64f613cf-bf04-4a09-8c14-75c31b619574
>  -Djobmanager.rpc.port=33739
>  -Drest.address=ip-10-180-30-250.us-west-2.compute.internal
>Reporter: Ori Popowski
>Priority: Major
>
> I'm getting this error when creating a savepoint. I've read in 
> https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by 
> unstable hashcode or equals on the key, or improper use of 
> {{reinterpretAsKeyedStream}}.
>   
>  My key is a string and I don't use {{reinterpretAsKeyedStream}}.
>  
> {code:java}
> senv
>   .addSource(source)
>   .flatMap(…)
>   .filterWith { case (metadata, _, _) => … }
>   .assignTimestampsAndWatermarks(new 
> BoundedOutOfOrdernessTimestampExtractor(…))
>   .keyingBy { case (meta, _) => meta.toPathString }
>   .process(new TruncateLargeSessions(config.sessionSizeLimit))
>   .keyingBy { case (meta, _) => meta.toPathString }
>   .window(EventTimeSessionWindows.withGap(Time.of(…)))
>   .process(new ProcessSession(sessionPlayback, config))
>   .addSink(sink){code}
>  
> {code:java}
> org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
> 962fc8e984e7ca1ed65a038aa62ce124 failed.
>   at 
> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:633)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:611)
>   at 
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843)
>   at 
> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:608)
>   at 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:910)
>   at 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>   at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>   at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
> Caused by: java.util.concurrent.CompletionException: 
> java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
>   at 
> org.apache.flink.runtime.scheduler.SchedulerBase.lambda$triggerSavepoint$3(SchedulerBase.java:744)
>   at 
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>   at 
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>   at 
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>   at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
>   at 
> 

[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-20 Thread Ori Popowski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ori Popowski updated FLINK-18637:
-
Description: 
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of {{reinterpretAsKeyedStream}}.
  
 My key is a string and I don't use {{reinterpretAsKeyedStream}}.

 
{code:java}
senv
  .addSource(source)
  .flatMap(…)
  .filterWith { case (metadata, _, _) => … }
  .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
  .keyingBy { case (meta, _) => meta.toPathString }
  .process(new TruncateLargeSessions(config.sessionSizeLimit))
  .keyingBy { case (meta, _) => meta.toPathString }
  .window(EventTimeSessionWindows.withGap(Time.of(…)))
  .process(new ProcessSession(sessionPlayback, config))
  .addSink(sink){code}
 
{code:java}
org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
962fc8e984e7ca1ed65a038aa62ce124 failed.
at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:633)
at 
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:611)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843)
at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:608)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:910)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$triggerSavepoint$3(SchedulerBase.java:744)
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 

[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-20 Thread Ori Popowski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ori Popowski updated FLINK-18637:
-
Environment: 
Version: 1.10.0, Rev:, Date:
OS current user: yarn
 Current Hadoop/Kerberos user: hadoop
 JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
 Maximum heap size: 28960 MiBytes
 JAVA_HOME: /usr/java/jdk1.8.0_141/jre
 Hadoop version: 2.8.5-amzn-6
 JVM Options:
 -Xmx30360049728
 -Xms30360049728
 -XX:MaxDirectMemorySize=4429185024
 -XX:MaxMetaspaceSize=1073741824
 -XX:+UseG1GC
 -XX:+UnlockDiagnosticVMOptions
 -XX:+G1SummarizeConcMark
 -verbose:gc
 -XX:+PrintGCDetails
 -XX:+PrintGCDateStamps
 -XX:+UnlockCommercialFeatures
 -XX:+FlightRecorder
 -XX:+DebugNonSafepoints
 
-XX:FlightRecorderOptions=defaultrecording=true,settings=/home/hadoop/heap.jfc,dumponexit=true,dumponexitpath=/var/lib/hadoop-yarn/recording.jfr,loglevel=info
 
-Dlog.file=/var/log/hadoop-yarn/containers/application_1593935560662_0002/container_1593935560662_0002_01_02/taskmanager.log
 -Dlog4j.configuration=[file:./log4j.properties|file:///log4j.properties]
 Program Arguments:
 -Dtaskmanager.memory.framework.off-heap.size=134217728b
 -Dtaskmanager.memory.network.max=1073741824b
 -Dtaskmanager.memory.network.min=1073741824b
 -Dtaskmanager.memory.framework.heap.size=134217728b
 -Dtaskmanager.memory.managed.size=23192823744b
 -Dtaskmanager.cpu.cores=7.0
 -Dtaskmanager.memory.task.heap.size=30225832000b
 -Dtaskmanager.memory.task.off-heap.size=3221225472b
 --configDir.
 
-Djobmanager.rpc.address=ip-10-180-30-250.us-west-2.compute.internal-Dweb.port=0
 -Dweb.tmpdir=/tmp/flink-web-64f613cf-bf04-4a09-8c14-75c31b619574
 -Djobmanager.rpc.port=33739
 -Drest.address=ip-10-180-30-250.us-west-2.compute.internal

  was:
OS current user: yarn
Current Hadoop/Kerberos user: hadoop
JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
Maximum heap size: 28960 MiBytes
JAVA_HOME: /usr/java/jdk1.8.0_141/jre
Hadoop version: 2.8.5-amzn-6
JVM Options:
   -Xmx30360049728
   -Xms30360049728
   -XX:MaxDirectMemorySize=4429185024
   -XX:MaxMetaspaceSize=1073741824
   -XX:+UseG1GC
   -XX:+UnlockDiagnosticVMOptions
   -XX:+G1SummarizeConcMark
   -verbose:gc
   -XX:+PrintGCDetails
   -XX:+PrintGCDateStamps
   -XX:+UnlockCommercialFeatures
   -XX:+FlightRecorder
   -XX:+DebugNonSafepoints
   
-XX:FlightRecorderOptions=defaultrecording=true,settings=/home/hadoop/heap.jfc,dumponexit=true,dumponexitpath=/var/lib/hadoop-yarn/recording.jfr,loglevel=info
   
-Dlog.file=/var/log/hadoop-yarn/containers/application_1593935560662_0002/container_1593935560662_0002_01_02/taskmanager.log
   -Dlog4j.configuration=file:./log4j.properties
Program Arguments:
   -Dtaskmanager.memory.framework.off-heap.size=134217728b
   -Dtaskmanager.memory.network.max=1073741824b
   -Dtaskmanager.memory.network.min=1073741824b
   -Dtaskmanager.memory.framework.heap.size=134217728b
   -Dtaskmanager.memory.managed.size=23192823744b
   -Dtaskmanager.cpu.cores=7.0
   -Dtaskmanager.memory.task.heap.size=30225832000b
   -Dtaskmanager.memory.task.off-heap.size=3221225472b
   --configDir.
   
-Djobmanager.rpc.address=ip-10-180-30-250.us-west-2.compute.internal-Dweb.port=0
   -Dweb.tmpdir=/tmp/flink-web-64f613cf-bf04-4a09-8c14-75c31b619574
   -Djobmanager.rpc.port=33739
   -Drest.address=ip-10-180-30-250.us-west-2.compute.internal


> Key group is not in KeyGroupRange
> -
>
> Key: FLINK-18637
> URL: https://issues.apache.org/jira/browse/FLINK-18637
> Project: Flink
>  Issue Type: Bug
> Environment: Version: 1.10.0, Rev:, Date:
> OS current user: yarn
>  Current Hadoop/Kerberos user: hadoop
>  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
>  Maximum heap size: 28960 MiBytes
>  JAVA_HOME: /usr/java/jdk1.8.0_141/jre
>  Hadoop version: 2.8.5-amzn-6
>  JVM Options:
>  -Xmx30360049728
>  -Xms30360049728
>  -XX:MaxDirectMemorySize=4429185024
>  -XX:MaxMetaspaceSize=1073741824
>  -XX:+UseG1GC
>  -XX:+UnlockDiagnosticVMOptions
>  -XX:+G1SummarizeConcMark
>  -verbose:gc
>  -XX:+PrintGCDetails
>  -XX:+PrintGCDateStamps
>  -XX:+UnlockCommercialFeatures
>  -XX:+FlightRecorder
>  -XX:+DebugNonSafepoints
>  
> -XX:FlightRecorderOptions=defaultrecording=true,settings=/home/hadoop/heap.jfc,dumponexit=true,dumponexitpath=/var/lib/hadoop-yarn/recording.jfr,loglevel=info
>  
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1593935560662_0002/container_1593935560662_0002_01_02/taskmanager.log
>  -Dlog4j.configuration=[file:./log4j.properties|file:///log4j.properties]
>  Program Arguments:
>  -Dtaskmanager.memory.framework.off-heap.size=134217728b
>  -Dtaskmanager.memory.network.max=1073741824b
>  -Dtaskmanager.memory.network.min=1073741824b
>  -Dtaskmanager.memory.framework.heap.size=134217728b
>  

[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-19 Thread Ori Popowski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ori Popowski updated FLINK-18637:
-
Description: 
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of {{reinterpretAsKeyedStream}}.
  
 My key is a string and I don't use {{reinterpretAsKeyedStream}}.

 
{code:java}
senv
  .addSource(source)
  .flatMap(…)
  .filterWith { case (metadata, _, _) => … }
  .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
  .keyingBy { case (meta, _) => meta.toPath.toString }
  .process(new TruncateLargeSessions(config.sessionSizeLimit))
  .keyingBy { case (meta, _) => meta.toPath.toString }
  .window(EventTimeSessionWindows.withGap(Time.of(…)))
  .process(new ProcessSession(sessionPlayback, config))
  .addSink(sink){code}
 
{code:java}
org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
962fc8e984e7ca1ed65a038aa62ce124 failed.
at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:633)
at 
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:611)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843)
at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:608)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:910)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$triggerSavepoint$3(SchedulerBase.java:744)
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 

[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-19 Thread Ori Popowski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ori Popowski updated FLINK-18637:
-
Description: 
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of {{reinterpretAsKeyedStream}}.
  
 My key is a string and I don't use {{reinterpretAsKeyedStream}}.

 
{code:java}
senv
  .addSource(source)
  .flatMap(…)
  .filterWith { case (metadata, _, _) => … }
  .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
  .keyingBy { case (meta, _) => meta.toPath.toString }
  .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
  .keyingBy { case (meta, _) => meta.toPath.toString }
  .window(EventTimeSessionWindows.withGap(Time.of(…)))
  .process(new ProcessSession(sessionPlayback, config))
  .addSink(sink){code}
 
{code:java}
org.apache.flink.util.FlinkException: Triggering a savepoint for the job 
962fc8e984e7ca1ed65a038aa62ce124 failed.
at 
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:633)
at 
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:611)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:843)
at 
org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:608)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:910)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:968)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:968)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
at 
org.apache.flink.runtime.scheduler.SchedulerBase.lambda$triggerSavepoint$3(SchedulerBase.java:744)
at 
java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at 
java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at 
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: The job has failed.
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 

[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-19 Thread Ori Popowski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ori Popowski updated FLINK-18637:
-
Description: 
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of reinterpretAsKeyedStream.
  
 My key is a string and I don't use reinterpretAsKeyedStream.

 
{code:java}
senv
  .addSource(source)
  .flatMap(…)
  .filterWith { case (metadata, _, _) => … }
  .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
  .keyingBy { case (meta, _) => meta.toPath.toString }
  .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
  .keyingBy { case (meta, _) => meta.toPath.toString }
  .window(EventTimeSessionWindows.withGap(Time.of(…)))
  .process(new ProcessSession(sessionPlayback, config))
  .addSink(sink){code}
 

 

  was:
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of reinterpretAsKeyedStream.
  
 My key is a string and I don't use reinterpretAsKeyedStream.

 
{code:java}
senv
  .addSource(source)
  .flatMap(…)
  .filterWith { case (metadata, _, _) => … }
  .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
   .keyingBy { case (meta, _) => meta.toPath.toString }
   .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
   .keyingBy { case (meta, _) => meta.toPath.toString }
   .window(EventTimeSessionWindows.withGap(Time.of(…)))
   .process(new ProcessSession(sessionPlayback, config))
   .addSink(sink){code}

  

 


> Key group is not in KeyGroupRange
> -
>
> Key: FLINK-18637
> URL: https://issues.apache.org/jira/browse/FLINK-18637
> Project: Flink
>  Issue Type: Bug
> Environment: OS current user: yarn
> Current Hadoop/Kerberos user: hadoop
> JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
> Maximum heap size: 28960 MiBytes
> JAVA_HOME: /usr/java/jdk1.8.0_141/jre
> Hadoop version: 2.8.5-amzn-6
> JVM Options:
>-Xmx30360049728
>-Xms30360049728
>-XX:MaxDirectMemorySize=4429185024
>-XX:MaxMetaspaceSize=1073741824
>-XX:+UseG1GC
>-XX:+UnlockDiagnosticVMOptions
>-XX:+G1SummarizeConcMark
>-verbose:gc
>-XX:+PrintGCDetails
>-XX:+PrintGCDateStamps
>-XX:+UnlockCommercialFeatures
>-XX:+FlightRecorder
>-XX:+DebugNonSafepoints
>
> -XX:FlightRecorderOptions=defaultrecording=true,settings=/home/hadoop/heap.jfc,dumponexit=true,dumponexitpath=/var/lib/hadoop-yarn/recording.jfr,loglevel=info
>
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1593935560662_0002/container_1593935560662_0002_01_02/taskmanager.log
>-Dlog4j.configuration=file:./log4j.properties
> Program Arguments:
>-Dtaskmanager.memory.framework.off-heap.size=134217728b
>-Dtaskmanager.memory.network.max=1073741824b
>-Dtaskmanager.memory.network.min=1073741824b
>-Dtaskmanager.memory.framework.heap.size=134217728b
>-Dtaskmanager.memory.managed.size=23192823744b
>-Dtaskmanager.cpu.cores=7.0
>-Dtaskmanager.memory.task.heap.size=30225832000b
>-Dtaskmanager.memory.task.off-heap.size=3221225472b
>--configDir.
>
> -Djobmanager.rpc.address=ip-10-180-30-250.us-west-2.compute.internal-Dweb.port=0
>-Dweb.tmpdir=/tmp/flink-web-64f613cf-bf04-4a09-8c14-75c31b619574
>-Djobmanager.rpc.port=33739
>-Drest.address=ip-10-180-30-250.us-west-2.compute.internal
>Reporter: Ori Popowski
>Priority: Major
>
> I'm getting this error when creating a savepoint. I've read in 
> https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by 
> unstable hashcode or equals on the key, or improper use of 
> reinterpretAsKeyedStream.
>   
>  My key is a string and I don't use reinterpretAsKeyedStream.
>  
> {code:java}
> senv
>   .addSource(source)
>   .flatMap(…)
>   .filterWith { case (metadata, _, _) => … }
>   .assignTimestampsAndWatermarks(new 
> BoundedOutOfOrdernessTimestampExtractor(…))
>   .keyingBy { case (meta, _) => meta.toPath.toString }
>   .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
>   .keyingBy { case (meta, _) => meta.toPath.toString }
>   .window(EventTimeSessionWindows.withGap(Time.of(…)))
>   .process(new ProcessSession(sessionPlayback, config))
>   .addSink(sink){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-18637) Key group is not in KeyGroupRange

2020-07-19 Thread Ori Popowski (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ori Popowski updated FLINK-18637:
-
Description: 
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of reinterpretAsKeyedStream.
  
 My key is a string and I don't use reinterpretAsKeyedStream.

 
{code:java}
senv
  .addSource(source)
  .flatMap(…)
  .filterWith { case (metadata, _, _) => … }
  .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
   .keyingBy { case (meta, _) => meta.toPath.toString }
   .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
   .keyingBy { case (meta, _) => meta.toPath.toString }
   .window(EventTimeSessionWindows.withGap(Time.of(…)))
   .process(new ProcessSession(sessionPlayback, config))
   .addSink(sink){code}

  

 

  was:
I'm getting this error when creating a savepoint. I've read in 
https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by unstable 
hashcode or equals on the key, or improper use of reinterpretAsKeyedStream.
 
My key is a string and I don't use reinterpretAsKeyedStream.
 
senv
 .addSource(source)
 .flatMap(…)
 .filterWith \{ case (metadata, _, _) => … }
 .assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor(…))
 .keyingBy \{ case (meta, _) => meta.toPath.toString }
 .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
 .keyingBy \{ case (meta, _) => meta.toPath.toString }
 .window(EventTimeSessionWindows.withGap(Time.of(…)))
 .process(new ProcessSession(sessionPlayback, config))
 .addSink(sink)
 


> Key group is not in KeyGroupRange
> -
>
> Key: FLINK-18637
> URL: https://issues.apache.org/jira/browse/FLINK-18637
> Project: Flink
>  Issue Type: Bug
> Environment: OS current user: yarn
> Current Hadoop/Kerberos user: hadoop
> JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15
> Maximum heap size: 28960 MiBytes
> JAVA_HOME: /usr/java/jdk1.8.0_141/jre
> Hadoop version: 2.8.5-amzn-6
> JVM Options:
>-Xmx30360049728
>-Xms30360049728
>-XX:MaxDirectMemorySize=4429185024
>-XX:MaxMetaspaceSize=1073741824
>-XX:+UseG1GC
>-XX:+UnlockDiagnosticVMOptions
>-XX:+G1SummarizeConcMark
>-verbose:gc
>-XX:+PrintGCDetails
>-XX:+PrintGCDateStamps
>-XX:+UnlockCommercialFeatures
>-XX:+FlightRecorder
>-XX:+DebugNonSafepoints
>
> -XX:FlightRecorderOptions=defaultrecording=true,settings=/home/hadoop/heap.jfc,dumponexit=true,dumponexitpath=/var/lib/hadoop-yarn/recording.jfr,loglevel=info
>
> -Dlog.file=/var/log/hadoop-yarn/containers/application_1593935560662_0002/container_1593935560662_0002_01_02/taskmanager.log
>-Dlog4j.configuration=file:./log4j.properties
> Program Arguments:
>-Dtaskmanager.memory.framework.off-heap.size=134217728b
>-Dtaskmanager.memory.network.max=1073741824b
>-Dtaskmanager.memory.network.min=1073741824b
>-Dtaskmanager.memory.framework.heap.size=134217728b
>-Dtaskmanager.memory.managed.size=23192823744b
>-Dtaskmanager.cpu.cores=7.0
>-Dtaskmanager.memory.task.heap.size=30225832000b
>-Dtaskmanager.memory.task.off-heap.size=3221225472b
>--configDir.
>
> -Djobmanager.rpc.address=ip-10-180-30-250.us-west-2.compute.internal-Dweb.port=0
>-Dweb.tmpdir=/tmp/flink-web-64f613cf-bf04-4a09-8c14-75c31b619574
>-Djobmanager.rpc.port=33739
>-Drest.address=ip-10-180-30-250.us-west-2.compute.internal
>Reporter: Ori Popowski
>Priority: Major
>
> I'm getting this error when creating a savepoint. I've read in 
> https://issues.apache.org/jira/browse/FLINK-16193 that it's caused by 
> unstable hashcode or equals on the key, or improper use of 
> reinterpretAsKeyedStream.
>   
>  My key is a string and I don't use reinterpretAsKeyedStream.
>  
> {code:java}
> senv
>   .addSource(source)
>   .flatMap(…)
>   .filterWith { case (metadata, _, _) => … }
>   .assignTimestampsAndWatermarks(new 
> BoundedOutOfOrdernessTimestampExtractor(…))
>    .keyingBy { case (meta, _) => meta.toPath.toString }
>    .flatMap(new TruncateLargeSessions(config.sessionSizeLimit))
>    .keyingBy { case (meta, _) => meta.toPath.toString }
>    .window(EventTimeSessionWindows.withGap(Time.of(…)))
>    .process(new ProcessSession(sessionPlayback, config))
>    .addSink(sink){code}
>   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)