[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107033#comment-16107033
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

Github user XuPingyong closed the pull request at:

https://github.com/apache/flink/pull/4347


> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
> Fix For: 1.4.0
>
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107032#comment-16107032
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

Github user XuPingyong commented on the issue:

https://github.com/apache/flink/pull/4347
  
Thanks @tillrohrmann !


> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
> Fix For: 1.4.0
>
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107024#comment-16107024
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/4347
  
With the changes of #4420, this problem should be resolved. Could you 
please close this PR then @XuPingyong.


> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
> Fix For: 1.4.0
>
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16101921#comment-16101921
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/4347
  
I think it would be better to harden the `JobLeaderIdService` such that it 
can be shut down concurrently. This actually also applies to the 
`HeartbeatManager`, the `SlotManager` and the `ResourceManager` itself.


> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091053#comment-16091053
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

Github user XuPingyong commented on the issue:

https://github.com/apache/flink/pull/4347
  
@StephanEwen ,  rpcService of ResourceManager executes with only one single 
thread, so there is no conflicts when resourcemanager is in service. When 
resourceManager is shutdown by the other thread, the rpcService had better stop 
first.




> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090301#comment-16090301
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4347
  
@XuPingyong Can you give us a bit of context for the review?
From the initial exception I would expect that there is something that also 
needs to be addressed in the `JobLeaderIdService` class...


> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-15 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088717#comment-16088717
 ] 

Aljoscha Krettek commented on FLINK-7201:
-

[~StephanEwen] do you have the expertise for looking at this or do we need to 
wait for [~till.rohrmann] to get back?

I marked this as "flip-6", there's no comments on {{MiniCluster}} but this 
seems to part of the FLIP-6 overhaul of the distributed runtime.

> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>  Labels: flip-6
>
> {code:java}
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> {code}
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-7201) ConcurrentModificationException in JobLeaderIdService

2017-07-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088389#comment-16088389
 ] 

ASF GitHub Bot commented on FLINK-7201:
---

GitHub user XuPingyong opened a pull request:

https://github.com/apache/flink/pull/4347

[FLINK-7201] fix concurrency in JobLeaderIdService when shutdown the …



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuPingyong/flink FLINK-7201

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/4347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4347


commit 2c04107f2bb76944f1759ba7a71de56347d8a2bf
Author: pingyong.xpy 
Date:   2017-07-15T02:56:13Z

[FLINK-7201] fix concurrency in JobLeaderIdService when shutdown the 
ResourceManager




> ConcurrentModificationException in JobLeaderIdService
> -
>
> Key: FLINK-7201
> URL: https://issues.apache.org/jira/browse/FLINK-7201
> Project: Flink
>  Issue Type: Bug
>  Components: JobManager
>Reporter: Xu Pingyong
>Assignee: Xu Pingyong
>
>  java.util.ConcurrentModificationException: null
>   at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>   at java.util.HashMap$ValueIterator.next(HashMap.java:950)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.clear(JobLeaderIdService.java:114)
>   at 
> org.apache.flink.runtime.resourcemanager.JobLeaderIdService.stop(JobLeaderIdService.java:92)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManager.shutDown(ResourceManager.java:200)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDownInternally(ResourceManagerRunner.java:102)
>   at 
> org.apache.flink.runtime.resourcemanager.ResourceManagerRunner.shutDown(ResourceManagerRunner.java:97)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdownInternally(MiniCluster.java:329)
>   at 
> org.apache.flink.runtime.minicluster.MiniCluster.shutdown(MiniCluster.java:297)
>   at 
> org.apache.flink.runtime.minicluster.MiniClusterITCase.runJobWithMultipleJobManagers(MiniClusterITCase.java:85)
> Because the jobLeaderIdService stops before the rpcService when shutdown the 
> resourceManager, jobLeaderIdService has a risk of thread-unsafe.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)