[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941289#comment-14941289 ] zhihai xu commented on YARN-3619: - I just attached the previous confused patch(YARN-3619.alt.patch) which I removed for comparison. It shares both timer and timer task, which is more complicated than YARN-3619.001.patch, So I think YARN-3619.001.patch(share timer only) is a better approach, which is simpler and more accurately controls the time to unregister the container metrics. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941249#comment-14941249 ] zhihai xu commented on YARN-3619: - The checkstyle issues and release audit warnings for the latest patch YARN-3619.001.patch were pre-existing. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941374#comment-14941374 ] Hadoop QA commented on YARN-3619: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 16s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 50s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 5s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 59s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 17s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 6m 32s | Tests failed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 0s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 8m 41s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 69m 6s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | Failed unit tests | hadoop.fs.sftp.TestSFTPFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764776/YARN-3619.alt.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 439f43a | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9330/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9330/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9330/console | This message was automatically generated. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941527#comment-14941527 ] Jason Lowe commented on YARN-3619: -- +1 for the .001 patch. The patch doesn't apply to branch-2.7 cleanly, and I'd like to get it fixed in 2.7.2. Can you provide a patch against branch-2.7? > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941718#comment-14941718 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-trunk-Commit #8559 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8559/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941627#comment-14941627 ] Hadoop QA commented on YARN-3619: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764832/YARN-3619.branch-2.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | branch-2 / 7964b13 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9335/console | This message was automatically generated. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941683#comment-14941683 ] Jason Lowe commented on YARN-3619: -- +1 for the branch-2.7 patch. Committing this. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941622#comment-14941622 ] zhihai xu commented on YARN-3619: - thanks [~jlowe]! Yes, I uploaded a patch YARN-3619.branch-2.7.patch based on branch-2.7. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941869#comment-14941869 ] zhihai xu commented on YARN-3619: - Thanks [~jlowe] for reviewing and committing the very old patch! I really appreciate it. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941810#comment-14941810 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #481 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/481/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941899#comment-14941899 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #473 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/473/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941924#comment-14941924 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1212 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1212/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941962#comment-14941962 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2417 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2417/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942027#comment-14942027 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #447 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/447/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942081#comment-14942081 ] Hudson commented on YARN-3619: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2387 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2387/]) YARN-3619. ContainerMetrics unregisters during getMetrics and leads to (jlowe: rev fdf02d1f26cea372bf69e071f57b8bfc09c092c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainerMetrics.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/metrics2/impl/MetricsSystemImpl.java * hadoop-yarn-project/CHANGES.txt > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Fix For: 2.7.2 > > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, > YARN-3619.alt.patch, YARN-3619.branch-2.7.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939861#comment-14939861 ] Jason Lowe commented on YARN-3619: -- Thanks for updating the patch! In the future, please do not remove patches unless absolutely necessary as it generates a lot of email churn and can lead to confusion as in this case. The patchbot comments above are _not_ for the most recent patch applied despite the patch basename being the same. This subsequent patch should have simply been version 002. Latest patch looks good to me, checkstyle and release audit warnings were pre-existing. Kicking Jenkins again to comment on the latest patch. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939946#comment-14939946 ] Hadoop QA commented on YARN-3619: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 51s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 9s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 58s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 27s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 7m 29s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 2m 1s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 8m 32s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 69m 41s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764544/YARN-3619.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 195793c | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9323/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9323/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9323/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9323/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9323/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9323/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9323/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9323/console | This message was automatically generated. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940128#comment-14940128 ] zhihai xu commented on YARN-3619: - Thanks for the good suggestions [~jlowe]! Will do it next time to avoid confusion! Thanks for the thorough review! > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939386#comment-14939386 ] Hadoop QA commented on YARN-3619: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 21m 31s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 14s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:red}-1{color} | checkstyle | 3m 0s | The applied patch generated 1 new checkstyle issues (total was 211, now 211). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 6m 27s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 8m 22s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 24s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:green}+1{color} | yarn tests | 8m 46s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 71m 48s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12764538/YARN-3619.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 5db371f | | Release Audit | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9318/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9318/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9318/console | This message was automatically generated. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939400#comment-14939400 ] zhihai xu commented on YARN-3619: - [~jlowe], thanks for the review! Yes, that is a very good suggestion. I uploaded a new patch YARN-3619.001.patch, which addressed all your comments(daemon thread and share a timer), please review it. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, YARN-3619.001.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937022#comment-14937022 ] Jason Lowe commented on YARN-3619: -- My apologies for the long delay, as this fell off my radar. The approach seems reasonable. The patch needs to be upmerged to trunk. In addition I'm wondering about the Timer handling. I think the Timer should be a daemon thread (we don't want to prolong NM shutdown due to this). Also it seems wasteful to dedicate a separate timer thread for every container that finished. It would be more efficient to share a timer that handles multiple timer tasks rather than spawn a thread for every timer task. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14937044#comment-14937044 ] Hadoop QA commented on YARN-3619: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733882/YARN-3619.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6c17d31 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9311/console | This message was automatically generated. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > --- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: zhihai xu > Attachments: YARN-3619.000.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551006#comment-14551006 ] Hadoop QA commented on YARN-3619: - \\ \\ | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 34s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 6s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 25m 14s | Tests passed in hadoop-common. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 6m 41s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 73m 17s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12733882/YARN-3619.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / c97f32e | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8003/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8003/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8003/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8003/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8003/console | This message was automatically generated. ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: zhihai xu Attachments: YARN-3619.000.patch, test.patch ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550894#comment-14550894 ] zhihai xu commented on YARN-3619: - I uploaded a patch YARN-3619.000.patch for review. I added a configuration NM_CONTAINER_METRICS_UNREGISTER_DELAY_MS to configure when to unregister the container metrics after it is finished. Because it may have potential memory leak If I schedule a thread to do unregistration at getMetrics. It looks like getMetrics will be called from two places:MetricsSystemImpl#sampleMetrics and MetricsSourceAdapter#getMBeanInfo. sampleMetrics won't be called if no sinks in MetricsSystemImpl. getMBeanInfo may not be called after registration if JMXJsonServlet#doGet is not called(no http Get request from JMX clients). It looks like there is a possibility that getMetrics won't be called after registration. ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: zhihai xu Attachments: YARN-3619.000.patch, test.patch ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539421#comment-14539421 ] zhihai xu commented on YARN-3619: - thanks [~kasha] for assigning this JIRA to me. The root cause is exactly what [~jlowe] said. I just added a little more details based on [~jlowe] succinct comment. {{sampleMetrics}} will be called periodically in MetricsSystemImpl. {{sampleMetrics}} will iterate the {{sources}} in the following code: {code} for (EntryString, MetricsSourceAdapter entry : sources.entrySet()) { if (sourceFilter == null || sourceFilter.accepts(entry.getKey())) { snapshotMetrics(entry.getValue(), bufferBuilder); } } {code} {{snapshotMetrics}} will be called to process every entry from {{sources}} The calling sequence which leads to a ConcurrentModificationException is snapshotMetrics = MetricsSourceAdapter#getMetrics = ContainerMetrics#getMetrics = MetricsSystemImpl#unregisterSource = sources.remove(name) the entry in the {{sources}} is removed when iterate the {{sources}}. So unregisterSource can't be called from getMetrics. I will prepare a patch for review. ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: zhihai xu ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541328#comment-14541328 ] zhihai xu commented on YARN-3619: - I attached a test patch which can reproduce this issue with the following stack trace: {code} Running org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainerMetrics Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.92 sec FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainerMetrics testContainerMetricsFinished(org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainerMetrics) Time elapsed: 1.194 sec ERROR! java.util.ConcurrentModificationException: null at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:394) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:413) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:412) at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.sampleMetrics(MetricsSystemImpl.java:403) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainerMetrics.testContainerMetricsFinished(TestContainerMetrics.java:144) Results : Tests in error: TestContainerMetrics.testContainerMetricsFinished:144 ยป ConcurrentModification Tests run: 3, Failures: 0, Errors: 1, Skipped: 0 {code} ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: zhihai xu Attachments: test.patch ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3619) ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException
[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538581#comment-14538581 ] Jason Lowe commented on YARN-3619: -- This appears to have been caused by YARN-2984. [~kasha] would you mind taking a look? ContainerMetrics unregisters during getMetrics and leads to ConcurrentModificationException --- Key: YARN-3619 URL: https://issues.apache.org/jira/browse/YARN-3619 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.7.0 Reporter: Jason Lowe ContainerMetrics is able to unregister itself during the getMetrics method, but that method can be called by MetricsSystemImpl.sampleMetrics which is trying to iterate the sources. This leads to a ConcurrentModificationException log like this: {noformat} 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN impl.MetricsSystemImpl: java.util.ConcurrentModificationException {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)