[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107033#comment-16107033 ] ASF GitHub Bot commented on FLINK-7201: --- Github user XuPingyong closed the pull request at: https://github.com/apache/flink/pull/4347 > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > Fix For: 1.4.0 > > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107032#comment-16107032 ] ASF GitHub Bot commented on FLINK-7201: --- Github user XuPingyong commented on the issue: https://github.com/apache/flink/pull/4347 Thanks @tillrohrmann ! > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > Fix For: 1.4.0 > > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107024#comment-16107024 ] ASF GitHub Bot commented on FLINK-7201: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/4347 With the changes of #4420, this problem should be resolved. Could you please close this PR then @XuPingyong. > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > Fix For: 1.4.0 > > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101921#comment-16101921 ] ASF GitHub Bot commented on FLINK-7201: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/4347 I think it would be better to harden the `JobLeaderIdService` such that it can be shut down concurrently. This actually also applies to the `HeartbeatManager`, the `SlotManager` and the `ResourceManager` itself. > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091053#comment-16091053 ] ASF GitHub Bot commented on FLINK-7201: --- Github user XuPingyong commented on the issue: https://github.com/apache/flink/pull/4347 @StephanEwen , rpcService of ResourceManager executes with only one single thread, so there is no conflicts when resourcemanager is in service. When resourceManager is shutdown by the other thread, the rpcService had better stop first. > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090301#comment-16090301 ] ASF GitHub Bot commented on FLINK-7201: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/4347 @XuPingyong Can you give us a bit of context for the review? From the initial exception I would expect that there is something that also needs to be addressed in the `JobLeaderIdService` class... > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088717#comment-16088717 ] Aljoscha Krettek commented on FLINK-7201: - [~StephanEwen] do you have the expertise for looking at this or do we need to wait for [~till.rohrmann] to get back? I marked this as "flip-6", there's no comments on {{MiniCluster}} but this seems to part of the FLIP-6 overhaul of the distributed runtime. > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > Labels: flip-6 > > {code:java} > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > {code} > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService
[ https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088389#comment-16088389 ] ASF GitHub Bot commented on FLINK-7201: --- GitHub user XuPingyong opened a pull request: https://github.com/apache/flink/pull/4347 [FLINK-7201] fix concurrency in JobLeaderIdService when shutdown the … You can merge this pull request into a Git repository by running: $ git pull https://github.com/XuPingyong/flink FLINK-7201 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/4347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4347 commit 2c04107f2bb76944f1759ba7a71de56347d8a2bf Author: pingyong.xpyDate: 2017-07-15T02:56:13Z [FLINK-7201] fix concurrency in JobLeaderIdService when shutdown the ResourceManager > ConcurrentModificationException in JobLeaderIdService > - > > Key: FLINK-7201 > URL: https://issues.apache.org/jira/browse/FLINK-7201 > Project: Flink > Issue Type: Bug > Components: JobManager >Reporter: Xu Pingyong >Assignee: Xu Pingyong > > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) > at java.util.HashMap$ValueIterator.next(HashMap.java:950) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114) > at > org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102) > at > org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329) > at > org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297) > at > org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85) > Because the jobLeaderIdService stops before the rpcService when shutdown the > resourceManager, jobLeaderIdService has a risk of thread-unsafe. -- This message was sent by Atlassian JIRA (v6.4.14#64029)