[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17020466#comment-17020466 ] Till Rohrmann commented on FLINK-14742: --- Let me know once you have the PR ready [~kkl0u]. I can help with reviewing it. > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Assignee: Kostas Kloudas >Priority: Critical > Fix For: 1.10.0 > > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017779#comment-17017779 ] Biao Liu commented on FLINK-14742: -- Such a subtle case if it couldn't be reproduced :) I have checked the test case, but didn't find the clue. Nice work [~kkl0u]! > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Assignee: Kostas Kloudas >Priority: Critical > Fix For: 1.10.0 > > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017059#comment-17017059 ] Kostas Kloudas commented on FLINK-14742: Ok it seems to be a test instability so I am downgrading it. > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.10.0 > > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016971#comment-17016971 ] Kostas Kloudas commented on FLINK-14742: I upgraded it to a blocker until we know what happens. > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Assignee: Kostas Kloudas >Priority: Blocker > Fix For: 1.10.0 > > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016929#comment-17016929 ] Kostas Kloudas commented on FLINK-14742: Or this: {code:java} Caught exception while executing runnable in main thread. java.lang.NullPointerException at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable.createSlotReport(TaskSlotTable.java:199) at org.apache.flink.runtime.taskexecutor.TaskExecutor.establishResourceManagerConnection(TaskExecutor.java:1030) at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1700(TaskExecutor.java:155) at org.apache.flink.runtime.taskexecutor.TaskExecutor$ResourceManagerRegistrationListener.lambda$onRegistrationSuccess$0(TaskExecutor.java:1725) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} which seems to be also related to concurrent modification of one of the slot data structures in the {{TaskSlotTable}}. > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Assignee: Kostas Kloudas >Priority: Critical > Fix For: 1.10.0 > > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016912#comment-17016912 ] Kostas Kloudas commented on FLINK-14742: If you run it in a loop until failure (configurable in Intellij), you end up having this exception: {code:java} ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor - Caught exception while executing runnable in main thread. java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1442) at java.util.HashMap$ValueIterator.next(HashMap.java:1471) at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable.createSlotReport(TaskSlotTable.java:213) at org.apache.flink.runtime.taskexecutor.TaskExecutor.establishResourceManagerConnection(TaskExecutor.java:1030) at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1700(TaskExecutor.java:155) at org.apache.flink.runtime.taskexecutor.TaskExecutor$ResourceManagerRegistrationListener.lambda$onRegistrationSuccess$0(TaskExecutor.java:1725) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code} > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Assignee: Kostas Kloudas >Priority: Critical > Fix For: 1.10.0 > > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977909#comment-16977909 ] Zili Chen commented on FLINK-14742: --- [~azagrebin] I'm not working on this. Here is the change set where I notice this test fails https://github.com/apache/flink/compare/965e4e58a354...79e3b32b4ae9 . IMO the change is safe so that I file a JIRA to report unstable test. I don't have a patch to reproduce it also. > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Priority: Major > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14742) Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot
[ https://issues.apache.org/jira/browse/FLINK-14742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977609#comment-16977609 ] Andrey Zagrebin commented on FLINK-14742: - [~tison] are you working on this? did you manage to reproduce the failure? > Unstable tests TaskExecutorTest#testSubmitTaskBeforeAcceptSlot > -- > > Key: FLINK-14742 > URL: https://issues.apache.org/jira/browse/FLINK-14742 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: Zili Chen >Priority: Major > > deadlock. > {code} > "main" #1 prio=5 os_prio=0 tid=0x7f1f8800b800 nid=0x356 waiting on > condition [0x7f1f8e65c000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x86e9e9c0> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.flink.runtime.taskexecutor.TaskExecutorTest.testSubmitTaskBeforeAcceptSlot(TaskExecutorTest.java:1108) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > full log https://api.travis-ci.org/v3/job/611275566/log.txt -- This message was sent by Atlassian Jira (v8.3.4#803005)