[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369766#comment-15369766 ] Hudson commented on YARN-3045: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10074/]) YARN-3045. Implement NM writing container lifecycle events to Timeline (sjlee: rev 477a30f536277bf95d7181bf1b2fdda52d83bf51) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEventType.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/PerNodeTimelineCollectorsAuxService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/TestApplication.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationContainerFinishedEvent.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/MockContainer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/Container.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelinePublisher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEvent.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/collector/TestPerNodeTimelineCollectorsAuxService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java > [Event producers] Implement NM writing container lifecycle events to ATS > > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Fix For: YARN-2928 > > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, > YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, > YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701109#comment-14701109 ] Junping Du commented on YARN-3045: -- I have commit latest (011) patch to YARN-2928 branch. Thanks [~Naganarasimha] for contributing the patch and [~sjlee0] for review! bq. So shall i handle YARN-3367 jira and then revisit the missing NM container and application events? Sure. I make it unassigned so feel free to pick up it. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699557#comment-14699557 ] Junping Du commented on YARN-3045: -- Latest patch LGTM. But will confirm YARN-2928 branch status before committing this. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700717#comment-14700717 ] Naganarasimha G R commented on YARN-3045: - Thanks [~djp] for reviewing this jira. In continuation to this jira, i think we need to make some progress in YARN-3367 (as discussed earlier in this jira) . So shall i handle YARN-3367 jira and then revisit the missing NM container and application events? And i think similar modifications are required on RM side too and also we need to handle other events in RM side. so was thinking about working on YARN-3880 and include the changes there . Please share your opinon. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700123#comment-14700123 ] Junping Du commented on YARN-3045: -- Get confirmed with Vinod that new YARN-2928 branch is good to go. Will commit it soon. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698511#comment-14698511 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 32s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 48s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 11s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 12s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 19s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 57m 26s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750681/YARN-3045-YARN-2928.011.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / f40c735 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8852/artifact/patchprocess/diffJavacWarnings.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8852/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8852/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8852/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8852/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8852/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045-YARN-2928.011.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698201#comment-14698201 ] Naganarasimha G R commented on YARN-3045: - Have run the findbugs manually most of them are not required to be fixed and one related to LocalizationEventDispatcher will be handled in further jiras [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697952#comment-14697952 ] Naganarasimha G R commented on YARN-3045: - [~djp] [~sjlee0], Seems like patch seems to be failing on the new YARN-2928 branch... will rebase and upload new one. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698126#comment-14698126 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 21s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 58s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 57s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 49s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 9m 20s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 9s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 57m 28s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12750640/YARN-3045-YARN-2928.010.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / f40c735 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8848/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8848/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8848/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045-YARN-2928.010.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694242#comment-14694242 ] Sangjin Lee commented on YARN-3045: --- The latest patch LGTM. But please hold off committing this patch per the branch issue mentioned in the yarn-dev email thread. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693424#comment-14693424 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 10s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 54s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 50s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 43s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 50s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 6s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 22s | Tests passed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 24s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 56m 15s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749943/YARN-3045-YARN-2928.009.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / bcd755e | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8832/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8832/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8832/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8832/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8832/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8832/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8832/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693443#comment-14693443 ] Junping Du commented on YARN-3045: -- Thanks [~Naganarasimha] for updating the patch! bq. i have added a new NMTimelineEvent which accepts the TimelineEntity and ApplicationId, this approach avoids creating new event classes and would just suffice exposing method in NMTimelinePublisher. +1. This approach sounds good to me. Just checked on Jenkins results that there is indeed no findbug issues and javac warnings are not related to this patch but some legacy code on the trunk. Latest patch (009) LGTM. Will commit this patch in if no further comments from others. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693460#comment-14693460 ] Naganarasimha G R commented on YARN-3045: - Thanks [~djp] for checking it. Testcase failure shown is passing locally, and other javac issues are not related to the patch. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692524#comment-14692524 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 17s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 9 new or modified test files. | | {color:red}-1{color} | javac | 7m 55s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 48s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 46s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 6s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:green}+1{color} | yarn tests | 1m 22s | Tests passed in hadoop-yarn-server-timelineservice. | | | | 55m 53s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | Failed unit tests | hadoop.yarn.server.nodemanager.TestDeletionService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12749943/YARN-3045-YARN-2928.009.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 07433c2 | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/diffJavacWarnings.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8826/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8826/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8826/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045-YARN-2928.009.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680545#comment-14680545 ] Sangjin Lee commented on YARN-3045: --- Thanks for the update [~djp]. Just so we're all on the same page, what more needs to be done in this JIRA, or is the latest patch good? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680526#comment-14680526 ] Junping Du commented on YARN-3045: -- bq. Junping Du one more favor can you join us in slack ? faster to communicate further. Sure. I joined before but cannot see anyone online (may be due to different timezones). As a typical apache way, discussions on JIRA should be sufficient in general. bq. but wanted to to place the localization events along with ContainerEntity. Is that fine ? Yes. This is fine for now as I claimed above. bq. Even though these ContainerEvents give details state of container but its state machine will be intermediate state (will not be DONE state) when these events are being processed (exit code, final state, diagnostic msg etc might not be filled in). So in the current patch i have published Containerfinished ATS event on APPLICATION_CONTAINER_FINISHED event rather than other container events, wanted to check further with you Junping Du, but anyway you only raised this topic. i feel it would be better to also capture CONTAINER_EXITED_WITH_FAILURE, CONTAINER_KILLED_ON_REQUEST among ContainerEvents. thoughts? APPLICATION_CONTAINER_FINISHED should be enough for now. We can add more container events later when we have priority mechanism as my comments above. bq. Its basically not unrecognized event but its event which is not of our interest, so better i can just ignore default case and delete it. ok? I would slightly prefer to handle them explicitly, including: ignore without handling on some events or put some debug/warn messages on unrecognized events. The reason is other developers who rename/add new events could forget to handle timeline events here, so debug/warn messages could help to identify them. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680874#comment-14680874 ] Sangjin Lee commented on YARN-3045: --- Sounds good. Thanks. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680942#comment-14680942 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp] [~sjlee0], I was in transit back to India, hence i was not able to work on it. Will get this done at the earliest. bq. Sure. I joined before but cannot see anyone online (may be due to different timezones). As a typical apache way, discussions on JIRA should be sufficient in general. Yes most of the times it should be sufficient but some time its easy to reach some one as it can notify them if mobile app is installed :) [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14680699#comment-14680699 ] Junping Du commented on YARN-3045: -- [~sjlee0], there is no further big issues but still left two minor issues which should easily be addressed, also see Naga's above comments “For other comments and issues reported jenkins will get it corrected as part of next patch.” Can we wait for Naga's next patch? Thx! [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659937#comment-14659937 ] Junping Du commented on YARN-3045: -- bq. The first patch was put up in April and later versions of the patch seem to have been +1'ed already by several folks. If there is a fundamental problem with the patch, we should address it. I feel sorry for the long delay to get in this patch. Part of reason, I think is due to my review work on this JIRA get intermittently as other JIRA work and personal reasons. I appreciate [~Naganarasimha]'s patient on this JIRA's work and I am sure the latest patch (08) is getting much closer. One thing I need to clarify here is: from the history of above comments, there is no +1 on the patch before, but only several +1s on ideas. bq. It would be good to keep the comments here really focused on this patch and open separate jiras for separate topics to that we can keep making progress. Agree. We separated several topics out already. bq. Sorry but it's not clear what the 2 options are. Could you kindly rephrase the options? I think for the two option, Naga means : 1. make resource localization events wrap as application entity; 2. make resource localization events wrap as NM entity (in case we add it in future). bq. Are some of these events already existing container events? If so, they shouldn't be repeated as application events redundantly, right? What would be the application-specific events that are not captured by container events? ContainerEvent has more details about life cycle events than ApplicationEvent, i.e. RESOURCE_LOCALIZED, RESOURCE_FAILED, etc. Also, even for some duplicated event, like APPLICATION_CONTAINER_FINISHED in ApplicationEvent, ContainerEvent could provide more details: CONTAINER_EXITED_WITH_SUCCESS, CONTAINER_EXITED_WITH_FAILURE, CONTAINER_KILLED_ON_REQUEST. I can understand some of these events are really trivial to application and that's why I push important levels from the beginning (so we can ignore them through configuration). Back to the 08 patch, mostly looks good to me but two small issues: {code} +/*nmMetricsPublisher.containerCreated(container, +System.currentTimeMillis());*/ ... +/*container.getNMTimelinePublisher().containerFinished(container, +System.currentTimeMillis());*/ {code} Omit these codes which are not useful now. For ApplicationEventDispatcher, ContainerEventDispatcher, LocalizationEventDispatcher: first, the name is a little confusing here: we should use *Handler rather than *Dispatcher as the functionality is handler of event but not dispatcher; second, for unrecognized event, we should log a warn message (at least debug message) instead of do nothing. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659829#comment-14659829 ] Junping Du commented on YARN-3045: -- The Jenkins test wasn't triggered for some reason. Manually kick off Jenkins test. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660638#comment-14660638 ] Naganarasimha G R commented on YARN-3045: - Hi All, bq. I appreciate Naganarasimha G R's patient on this JIRA's work and I am sure the latest patch (08) is getting much closer. Thanks for the support will try to get it closure as early as possible, [~djp] one more favor can you join us in slack ? faster to communicate further. bq. Sorry but it's not clear what the 2 options are. Could you kindly rephrase the options? Sorry for being cryptic here. what i meant was whether its sufficient to capture localization events once @ container level(Localization was successful or failed). Or is it required to capture for each {{LocalizedResource}} required by Container which is more detailed to analyze if any particular resource is taking time. For the later we need to use Events on the LocalizedResource state machine (i.e. ResourceEventType.REQUEST, LOCALIZED LOCALIZATION_FAILED) And for the former we can either use {{ResourceLocalizationService}} events (, i.e. LocalizationEventType.INIT_CONTAINER_RESOURCES CONTAINER_RESOURCES_LOCALIZED) or Events on the {{ContainerImpl state machine}} (i.e. ContainerEventType.RESOURCE_LOCALIZED RESOURCE_FAILED). Advantage of using ResourceLocalizationService events is it has precise time of start of localization and end of localization, but in case ContainerImpl (ContainerEventType) we will get it approximately by calculating the difference between the timestamps when INIT_CONTAINER RESOURCE_LOCALIZED events are published. Among these options my opinion was to use ContainerImpl StateMachine events. bq. Naga means : 1. make resource localization events wrap as application entity; 2. make resource localization events wrap as NM entity (in case we add it in future). As explained in prev comment my intention was different, but wanted to to place the localization events along with ContainerEntity. Is that fine ? bq. Also, even for some duplicated event, like APPLICATION_CONTAINER_FINISHED in ApplicationEvent, ContainerEvent could provide more details: CONTAINER_EXITED_WITH_SUCCESS, CONTAINER_EXITED_WITH_FAILURE, CONTAINER_KILLED_ON_REQUEST. Even though these ContainerEvents give details state of container but its state machine will be intermediate state (will not be DONE state) when these events are being processed (exit code, final state, diagnostic msg etc might not be filled in). So in the current patch i have published Containerfinished ATS event on APPLICATION_CONTAINER_FINISHED event rather than other container events, wanted to check further with you [~djp], but anyway you only raised this topic. i feel it would be better to also capture CONTAINER_EXITED_WITH_FAILURE, CONTAINER_KILLED_ON_REQUEST among ContainerEvents. thoughts ? bq. second, for unrecognized event, we should log a warn message (at least debug message) instead of do nothing. Its basically not unrecognized event but its event which is not of our interest, so better i can just ignore default case and delete it. ok ? For other comments and issues reported jenkins will get it corrected as part of next patch. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660031#comment-14660031 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 14s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:red}-1{color} | javac | 7m 55s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 7s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 45s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 7s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 1s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 1m 18s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 55m 44s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.application.TestApplication | | | hadoop.yarn.server.timelineservice.collector.TestPerNodeTimelineCollectorsAuxService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748831/YARN-3045-YARN-2928.008.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 895ccfa | | javac | https://builds.apache.org/job/PreCommit-YARN-Build/8781/artifact/patchprocess/diffJavacWarnings.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8781/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8781/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8781/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8781/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8781/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8781/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8781/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658386#comment-14658386 ] Junping Du commented on YARN-3045: -- Thanks [~sjlee0] and [~Naganarasimha] for quickly reply. bq. If these events are attributes of applications, then they should be on the application entities. If I want to find out all events for some application, then I should be able to query only the application entity and get all events. Some of these events are related to both application and NodeManager. We can claim that it belongs to application but we can see that some events are too detailed to application but could be more interested for YARN daemons. I can understand that our design is more application centric now but should be generic enough to store/retrival YARN daemon centric entities later. Anyway, before making NM/RM onboard as the first class consumer of ATSv2, I am fine with making them as application events. bq. The need to have NodeManagerEntity is something different IMO. Note that today there are challenges in emitting data without any application context (e.g. node manager's configuration) as we discussed a few times. If we need to support that, that needs a different discussion. I see. I remember to see a JIRA work is to get ride of application context but cannot find it now. In case we don't have it, how about move this discussion to YARN-3959? The original scope of that JIRA is application related configuration only but we could extend it to include daemon configuration if necessary. bq. my assumption was that the sync/async distinction from the client perspective mapped to whether the writer may be flushed or not. If not, then we need to support a 2x2 matrix of possibilities: sync put w/ flush, sync put w/o flush, async put w/ flush, and async put w/o flush. I thought it would be a simplifying assumption to align those dimensions. I think we can simplify 2x2 matrix by omitting the case of sync put w/o flush as I cannot think a valid case that ack from TimelineCollector without flush can help on. Rest of three cases sounds solid to me. To make TimelineCollector can identify flush strategies with async calls, we may need to set severity on entities need to put and TimelineCollector is configured to flush entities only above specific severity just like log level does. bq. I was under the impression that YARN-3367 is only for invoking REST calls in nonblocking way and thus avoiding threads in the clients. Is it also related to flush when called only putEntities and not on putEntitiesAsync? You are right that the goal of YARN-3367 is to get rid of blocking call to put entities, no matter it calls putEntities() or something else. putEntitiesAsync() is exactly what we need, and it should be rare case to use putEntities() once we have putEntitiesAsync except client logic rely on return results tightly. bq. I see currently async parameter as part of REST request is ignored now, so i thought based on this param we may need to further flush the writer or is your thoughts similar to support 2*2 matrix as Sangjin was informing? Actually, from my above comments, I would prefer the way of (2*2 - 1). :) To speed up this JIRA's progress, I am fine with keep ignoring sync/async parameter and do everything async for now and left it out to a dedicated JIRA to figure out. Will look at latest patch soon. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658671#comment-14658671 ] Joep Rottinghuis commented on YARN-3045: Yeah, the discussion thread here seems to be rather deep. The first patch was put up in April and later versions of the patch seem to have been +1'ed already by several folks. If there is a fundamental problem with the patch, we should address it. It would be good to keep the comments here really focused on this patch and open separate jiras for separate topics to that we can keep making progress. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659301#comment-14659301 ] Naganarasimha G R commented on YARN-3045: - [~djp] [~sjlee0] Waiting for comments on the approach taken and wanted what exact aspects of localization needs to be captured. Is it related to {{ResourceLocalizationService events, i.e. LocalizationEventType.INIT_CONTAINER_RESOURCES CONTAINER_RESOURCES_LOCALIZED}} / {{events of each individual Localized Resource i.e. ResourceEventType.REQUEST, LOCALIZED LOCALIZATION_FAILED}} / {{ContainerEventType.RESOURCE_LOCALIZED RESOURCE_FAILED}} is sufficient to be captured ? In my opinion last option is sufficient or shall we handle localization events in another jira. Please share your thoughts ? bq. . Anyway, before making NM/RM onboard as the first class consumer of ATSv2, I am fine with making them as application events. Ok, going ahead to capture them under Application Entity. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659323#comment-14659323 ] Sangjin Lee commented on YARN-3045: --- bq. In my opinion last option is sufficient or shall we handle localization events in another jira. Please share your thoughts ? Sorry but it's not clear what the 2 options are. Could you kindly rephrase the options? Also, one clarifying question. Are some of these events already existing container events? If so, they shouldn't be repeated as application events redundantly, right? What would be the application-specific events that are *not* captured by container events? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659121#comment-14659121 ] Li Lu commented on YARN-3045: - bq. To speed up this JIRA's progress, I am fine with keep ignoring sync/async parameter and do everything async for now and left it out to a dedicated JIRA to figure out. +1. This JIRA has been hanging there for quite a while. Let's move forward with pending storage API problems addressed in separate JIRAs. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659266#comment-14659266 ] Sangjin Lee commented on YARN-3045: --- bq. I see. I remember to see a JIRA work is to get ride of application context but cannot find it now. It's YARN-3981. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045-YARN-2928.008.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653640#comment-14653640 ] Junping Du commented on YARN-3045: -- Sorry for coming late on this as in travelling recently. Thanks for good discussions and comments above. bq. I am fine with single jira, but only trouble is as and when the scope increases there will be more delay in the jira as more discussions will be required(in this case which entity to publish NM App localization events ) and also as its been long since i am holding this jira so thought of getting the basic one out and develop on top of it. Ok. Let's get basic things in first then discuss/work on other details later if that move work quickly. bq. We had a discussion on this topic today in the meeting and Sangjin Lee was of the opinion not to have another entity here. I think we need more discussions on this as it involves querying too. Sorry. I wasn't at that meeting. What's the concern to have NodeManagerEntity? Without this, how could we store something like NM's configuration? bq. Along with that, it is implicitly understood that TimelineWriter.write() may be asynchronous (i.e. may not write the data to the storage synchronously or promptly). This is true today. However, it may not be precisely for all cases/scenarios. Some implementation of TimelineWriter, like: FS, may only have sync semantics for write(), and flush() could do nothing. bq. This API should be sufficient for TimelineCollector to express synchronous/critical put operations and asynchronous/non-critical put operations. TimelineCollector will not expose flush() directly to clients. Instead, it may use things like putEntity() and putEntityAsync() to expose that semantics to the client. In the simplest terms, TimelineCollector could implement putEntity() as putEntityAsync() + TimelineWriter.flush(). This is not the actual suggestion of the implementation, but that would be an idea. We already have putEntity() and putEntityAsync(), but we haven't yet used flush() to do this behavior. Do we need to differentiate synchronous with critical in put operation from TimelineClient prospective? Sync most likely mean the client logic rely on the return result of the put call and async put just mean we call put in a non-blocking way. Critical and non-critical for messages(entities) is a relative concept and could be various under different system configurations. Thus, I won't be surprised if we put some critical entities in async way as very rare case we do need sync put in client. Actually, I was convinced in YARN-3949 (https://issues.apache.org/jira/browse/YARN-3949?focusedCommentId=14640910page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14640910) that collector level knows better than writer to decide if it should flush. I would also like to claim that collector could also know better than client on the boundary between critical and non-critical due to the knowledge on system configuration, e.g. less types of entities should be counted as critical for a large scale cluster but client has no knowledge about it. If collector has no add-on knowledge against client, it could be simpler to pass down sync/async() from client to sync/async in writer. Isn't it? bq. And yes, we'll need another JIRA to differentiate putEntity() and putEntityAsync() and use them at the right places. Currently, putEntity() does not call flush(), and all client calls are using putEntity(). YARN-3367 already track this. It is painful for the client to wrap putEnity() with thread-pool or dispatcher to achieve non-blocking way. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654644#comment-14654644 ] Naganarasimha G R commented on YARN-3045: - Based on the feed back may be can remove some classes (have kept it as some of the events cannot be captured through state machine transitions on events like container metrics)... also was wondering for localization should we need to capture ResourceLocalizationService events or {{LocalizedResource}} state machine's events which is for each resource which is getting localized. Currently in the patch have wrapped ResourceLocalizationService and capturing its events, but if we need to capture events of LocalizedResource, then need to modify for it. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654744#comment-14654744 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12748765/YARN-3045-YARN-2928.007.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / bf65663 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8769/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654132#comment-14654132 ] Sangjin Lee commented on YARN-3045: --- Thanks for your input [~djp]! Just wanted to clarify a few things. {quote} Sorry. I wasn't at that meeting. What's the concern to have NodeManagerEntity? Without this, how could we store something like NM's configuration? {quote} Naga is referring to the Wednesday status call. What I said is that we do not need a separate entity to handle *application*-related events coming out of node managers. If these events are attributes of applications, then they should be on the application entities. If I want to find out all events for some application, then I should be able to query only the application entity and get all events. The need to have NodeManagerEntity is something different IMO. Note that today there are challenges in emitting data without any application context (e.g. node manager's configuration) as we discussed a few times. If we need to support that, that needs a different discussion. {quote} This is true today. However, it may not be precisely for all cases/scenarios. Some implementation of TimelineWriter, like: FS, may only have sync semantics for write(), and flush() could do nothing. {quote} That's correct. What I meant was in general the *contract* of write() may not provide a guarantee that the data will be written completely synchronously. For FS, yes, it will sync. Thus the operative word may. :) {quote} Do we need to differentiate synchronous with critical in put operation from TimelineClient prospective? Sync most likely mean the client logic rely on the return result of the put call and async put just mean we call put in a non-blocking way. Critical and non-critical for messages(entities) is a relative concept and could be various under different system configurations. Thus, I won't be surprised if we put some critical entities in async way as very rare case we do need sync put in client. Actually, I was convinced in YARN-3949 (https://issues.apache.org/jira/browse/YARN-3949?focusedCommentId=14640910page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14640910) that collector level knows better than writer to decide if it should flush. I would also like to claim that collector could also know better than client on the boundary between critical and non-critical due to the knowledge on system configuration, e.g. less types of entities should be counted as critical for a large scale cluster but client has no knowledge about it. If collector has no add-on knowledge against client, it could be simpler to pass down sync/async() from client to sync/async in writer. Isn't it? {quote} Hmm, my assumption was that the sync/async distinction from the client perspective mapped to whether the writer may be flushed or not. If not, then we need to support a 2x2 matrix of possibilities: sync put w/ flush, sync put w/o flush, async put w/ flush, and async put w/o flush. I thought it would be a simplifying assumption to align those dimensions. My main point in YARN-3949 is that it is sufficient for the writer to provide write() and flush(). The timeline collector can then support all possible semantics, even including the 2x2 matrix behavior if needed. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654724#comment-14654724 ] Naganarasimha G R commented on YARN-3045: - hi [~djp], Thanks for the feedback. bq. Ok. Let's get basic things in first then discuss/work on other details later if that move work quickly. Have uploaded a WIP patch, can you take a look at it bq. What's the concern to have NodeManagerEntity? Without this, how could we store something like NM's configuration? Well here what Sangjin's as well my thoughts are if are going to query NM level Application events as part of application then it should be under ApplicationEntity and if for other scenarios we req then we can have NodeManagerEntity, i think. bq. If collector has no add-on knowledge against client, it could be simpler to pass down sync/async() from client to sync/async in writer. Did you mean by {{sync/async in writer}} as {{putEntities Flush / putEntities}} respectively on writer ? or as [~sjlee0] mentioned 2*2 matrix ? bq. YARN-3367 already track this. It is painful for the client to wrap putEntity() with thread-pool or dispatcher to achieve non-blocking way. I was under the impression that YARN-3367 is only for invoking REST calls in nonblocking way and thus avoiding threads in the clients. Is it also related to flush when called only {{putEntities}} and not on {{putEntitiesAsync}}? I see currently async parameter as part of REST request is ignored now, so i thought based on this param we may need to further flush the writer or is your thoughts similar to support 2*2 matrix as Sangjin was informing ? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045-YARN-2928.007.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650231#comment-14650231 ] Naganarasimha G R commented on YARN-3045: - [~sjlee0], Thanks for the detailed explanation now i am clear about the plans for TimelineClient and indirect support of flush for it. bq. If these events are already associated with containers any way, they are not an issue, right? Well [~djp] had pointed for 2 new stuff as part of this jira(apart from the existing container life cycle events), one was NM side application events and other was container resource localization events. In the later case again (similar to NM Side Application event) multiple resource paths can be localized for a given container, so different states of localization cannot be directly put as event ID's as we need to also publish the information of which resource this event belongs too. hence had suggested earlier as : ??For Localization i feel it can be under ContainerEntity and the EventID can have Event Type (REQUEST,LOCALIZED,LOCALIZATION_FAILED)and PATH of the localized resource.?? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650049#comment-14650049 ] Sangjin Lee commented on YARN-3045: --- {quote} Sorry was not able to completely get it. Are you referring to TimelineClient's putEntities putEntitiesAsync ? or TimelineWriter's write flush ? if its later, apart from RM most of the entities and events of NM and application go through TimelineClient TimelineCollector . So there should be some way for NM and Client App(AM) to inform to flush the data right? {quote} Sorry if my comments were not detailed enough. As you know, we have layers of APIs: {{TimelineClient}} which clients (RM, NM, AM, ...) use, {{TimelineCollector}} that receives those calls and interacts with the writer, and {{TimelineWriter}} that handles actual writes. The point was that {{flush()}} belongs only in {{TimelineWriter}}. Along with that, it is implicitly understood that {{TimelineWriter.write()}} may be asynchronous (i.e. may not write the data to the storage synchronously or promptly). This API should be sufficient for {{TimelineCollector}} to express synchronous/critical put operations and asynchronous/non-critical put operations. {{TimelineCollector}} will *not* expose {{flush()}} directly to clients. Instead, it may use things like {{putEntity()}} and {{putEntityAsync()}} to expose that semantics to the client. In the simplest terms, {{TimelineCollector}} could implement {{putEntity()}} as {{putEntityAsync()}} + {{TimelineWriter.flush()}}. This is not the actual suggestion of the implementation, but that would be an idea. We already have {{putEntity()}} and {{putEntityAsync()}}, but we haven't yet used {{flush()}} to do this behavior. And yes, we'll need another JIRA to differentiate {{putEntity()}} and {{putEntityAsync()}} and use them at the right places. Currently, {{putEntity()}} does not call {{flush()}}, and all client calls are using {{putEntity()}}. {quote} But some of them related to localization i feel is related to ContainerEntity right? hope the approach captured by me is fine ? {quote} If these events are already associated with containers any way, they are not an issue, right? I thought there were these events that are really application events, but specific to the nodes? My comments were about those application events. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649985#comment-14649985 ] Naganarasimha G R commented on YARN-3045: - Thanks [~sjlee0], for clarifying and describing ur comment, but had some queries bq. Note that the user of those two methods is really TimelineCollector. I don't think we'd be exposing flush() to TimelineClient. Sorry was not able to completely get it. Are you referring to TimelineClient's putEntities putEntitiesAsync ? or TimelineWriter's write flush ? if its later, apart from RM most of the entities and events of NM and application go through TimelineClient TimelineCollector . So there should be some way for NM and Client App(AM) to inform to flush the data right? bq. The synchronous nature of the writes would be expressed differently on TimelineClient. You mean TimelineClient's putEntities will be ensuring of calling flush ? if so some other jira will be handling this ? bq. These are really events that belong to YARN applications, and I don't see why they shouldn't be part of the YARN application entities. But some of them related to localization i feel is related to ContainerEntity right? hope the approach captured by me is fine ? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646992#comment-14646992 ] Naganarasimha G R commented on YARN-3045: - Thanks for the comments [~djp], bq. We already have a new flush() API now for writer that checked in YARN-3949... You are right that we are lacking of API to respect this priority/policy in the whole data flow for writing. I will file another JIRA to track this. I went through the discussions and the patch of YARN-3949, i feel calling two apis would be not so user friendly and how will the users of TimelineClient call flush ? i think its not captured in YARN-3949 bq. Anyway, I would support the scope (container events + foundation work) you proposed here in case you are comfortable with. I am fine with single jira, but only trouble is as and when the scope increases there will be more delay in the jira as more discussions will be required(in this case which entity to publish NM App localization events ) and also as its been long since i am holding this jira so thought of getting the basic one out and develop on top of it. I am ok if you want to avoid multiple jira's. bq. That's a good question. My initative thinking is we could need something like NodemanagerEntity to store application events, resource localizaiton event, log aggregation handling events, configuration, etc. However, I would like to hear you and other guys' ideas on this as well. We had a discussion on this topic today in the meeting and [~sjlee0] was of the opinion not to have another entity here. I think we need more discussions on this as it involves querying too. Approach what i can think of is : * For Applicationlevel events in NM can be under ApplicationEntity and EventID can have Event Type (INIT_APPLICATION/APPLICATION_FINISHED/APPLICATION_LOG_HANDLING_FAILED) and NM_ID * For Localization i feel it can be under ContainerEntity and the EventID can have Event Type (REQUEST,LOCALIZED,LOCALIZATION_FAILED)and PATH of the localized resource. bq. IMO, the 2nd approach (hook to existing event dispatcher) looks simpler and straightforward. This approach is straight fwd but not sure it might have impact( just initial apprehensions) but will start of implementing for container events and share the initial patch based on this approach. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14647163#comment-14647163 ] Sangjin Lee commented on YARN-3045: --- bq. I went through the discussions and the patch of YARN-3949, i feel calling two apis would be not so user friendly and how will the users of TimelineClient call flush ? i think its not captured in YARN-3949 We did discuss it in that JIRA. See [this comment|https://issues.apache.org/jira/browse/YARN-3949?focusedCommentId=14640959page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14640959] for instance. Note that the user of those two methods is really {{TimelineCollector}}. I don't think we'd be exposing {{flush()}} to {{TimelineClient}}. The synchronous nature of the writes would be expressed differently on {{TimelineClient}}. bq. We had a discussion on this topic today in the meeting and Sangjin Lee was of the opinion not to have another entity here. I think we need more discussions on this as it involves querying too. To elaborate it a little further, creating a new entity type just to capture different origins of application events seems bit too much. These are really events that belong to YARN applications, and I don't see why they shouldn't be part of the YARN application entities. It also simplifies the query model. When you query for a YARN application entity, you get all application events, regardless of whether they originate from RM or NMs. That's a much nicer interaction for querying for a YARN app. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643347#comment-14643347 ] Junping Du commented on YARN-3045: -- bq. Well was aware that priority was not to differentiate the containers but for the events of it, but i thought you mentioned for the purpose of better querying rather than the purpose of writing it. Better query is one of purpose but writing them in different policies could also be a consideration here. We may not afford to flush every events in a large scale cluster, so we may choose to ignore/cache some unimportant ones. bq. I have not gone through the writer code completely but is there any caching which you want to flush if the event priority is high ? Also was thinking whether we need to change the Writer/Collector API to mention the criticality of the event being published? We already have a new flush() API now for writer that checked in YARN-3949. Please refer some of discussions there with details. You are right that we are lacking of API to respect this priority/policy in the whole data flow for writing. I will file another JIRA to track this. bq. So from NM side we want to publish events for ApplicationEntity and ContainerEntity, but based on the title of this jira i thought scope of this jira is to handle only ContainerEntities from NM side, is it better to handle events related Application entities specific to a given NM in another Jira? but i can try to ensure required foundation is done in NM side in this JIRA as part of your other comments, Thoughts? I am fine with separating events other than container events to a separated JIRA if it is really necessary. In common case, jira title shouldn't bound the implementation as at JIRA proposing time, there is no so concrete goal like when JIRA is being implemented so we can fix/adjust later. Anyway, I would support the scope (container events + foundation work) you proposed here in case you are comfortable with. bq. Also event has just id but NM related Application events will have the same event ID in different NM's so would it be something like INIT_APPLICATION_NODE_ID ? That's a good question. My initative thinking is we could need something like NodemanagerEntity to store application events, resource localizaiton event, log aggregation handling events, configuration, etc. However, I would like to hear you and other guys' ideas on this as well. bq. +1 for this thought, had the same initial hitch as in future if we add more events than unnecessary create event and methods in publisher, but for the initial version thought will have approach similar to RM and ATSV1. But i feel better to handle now than refactor later on. But i can think of couple of approaches here. Yes. All three approaches seems to work here. IMO, the 2nd approach (hook to existing event dispatcher) looks simpler and straightforward. bq. Was not clear about the comment, IIRC Zhijjie in the meeting also mentioned that i am handling removing threaded model of publishing container metrics statistics as part of this jira. May be i am missing some other jira which you are already working on, may be can you englighten me about it? I was thinking you are encapsulating metrics with TimelineEvent but actually not. So no worry on my previous comments on this. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643656#comment-14643656 ] Sangjin Lee commented on YARN-3045: --- Sorry it took me a while to catch up on this thread. Regarding the *application* lifecycle events, I think it would be the responsibility of RM (timeline collector) to publish application lifecycle events. Are there application lifecycle events that can only come from NM? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643811#comment-14643811 ] Junping Du commented on YARN-3045: -- bq. Regarding the application lifecycle events, I think it would be the responsibility of RM (timeline collector) to publish application lifecycle events. Are there application lifecycle events that can only come from NM? Actually, it may depends on in which prospective we define a life cycle for an application. In RM prospective, it could contain some fundamental events from application submitted to finished. In NM prospective, it could include some details for application launched locally, such as: APPLICATION_INITED, APPLICATION_RESOURCES_CLEANEDUP, etc. I think user could have interests for both. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642150#comment-14642150 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp] bq. 1.what we want to differentiate here is what kind of events are critical (so writer client in TimelineCollector could flush to backend storage after written them) and what kinds of events are not so critical Well was aware that priority was not to differentiate the containers but for the events of it, but i thought you mentioned for the purpose of better querying rather than the purpose of writing it. I have not gone through the writer code completely but is there any caching which you want to flush if the event priority is high ? Also was thinking whether we need to change the Writer/Collector API to mention the criticality of the event being published? bq. From an initiative thinking, some important app/container events include: INIT_APPLICATION, INIT_CONTAINER, FINISH_APPLICATION, APPLICATION_CONTAINER_FINISHED, APPLICATION_LOG_HANDLING_FAILED, while unimportant events could include: APPLICATION_INITED, APPLICATION_RESOURCES_CLEANEDUP, APPLICATION_LOG_HANDLING_INITED, APPLICATION_LOG_HANDLING_FINISHED, etc. So from NM side we want to publish events for ApplicationEntity and ContainerEntity, but based on the title of this jira i thought scope of this jira is to handle only ContainerEntities from NM side, is it better to handle events related Application entities specific to a given NM in another Jira? but i can try to ensure required foundation is done in NM side in this jira as part of your other comments, Thoughts? Also event has just id but NM related Application events will have the same event ID in different NM's so would it be something like {{INIT_APPLICATION_NODE_ID}} ? bq. 2. We should have some handy method to turn these app/container events to TimelineEvent and publish these events in a consensus way rather than publish one type of event with one method. bq. 3. We don't need to create new container events but should log existing YARN app/container events that happen in NM. If we really think some important events are missing in YARN, we can have futher discussions later after timeline service v2 in good shape. +1 for this thought, had the same initial hitch as in future if we add more events than unnecessary create event and methods in publisher, but for the initial version thought will have approach similar to RM and ATSV1. But i feel better to handle now than refactor later on. But i can think of couple of approaches here # Approach as you mentioned inside the app/container transitions in the NM side publish the event containing the container/app information. May be in some cases like creation of app or container caller can publish the events (like Container created so as to capture the creation time rather than ) # In ContainerEventDispatcher,ApplicationEventDispatcher rsrcLocalizationSrvc after handling it can by default call different handlers of NMTimeLinePublisher(inner classes) to handle the respective events. Specific req events can be handled and others can be just ignored. # Source itself can create the entity and the event object and NMTimelinePublisher can expose a method to take timeline objects add it to Async Dispatcher and event handler will just call the client to publish the event/entity. bq. 4. It looks like NMTimelinePublisher should be used by ContainerManager, Container, ResourceLocalizationService and Log Handler. Move it to NMContext should be convenient to use for other components. Will take care based on the approach we take as per prev step. bq. 5. Container Resource Usage event may not be necessary given we already have metrics update and will do aggregation according to metrics update.bq. 1.what we want to differentiate here is what kind of events are critical (so writer client in TimelineCollector could flush to backend storage after written them) and what kinds of events are not so critical Was not clear about the comment, IIRC Zhijjie in the meeting also mentioned that i am handling removing threaded model of publishing container metrics statistics as part of this jira. May be i am missing some other jira which you are already working on, may be can you englighten me about it ? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch,
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641295#comment-14641295 ] Junping Du commented on YARN-3045: -- Thanks [~Naganarasimha] for updating the patch and sorry for coming late on this. I just go through the latest patch (06) and have some comments below: 1. I don't think we want to apply priority in container level to say which containers are important while others are not important. What we want to differentiate here is what kind of events are critical (so writer client in TimelineCollector could flush to backend storage after written them) and what kinds of events are not so critical. From an initiative thinking, some important app/container events include: INIT_APPLICATION, INIT_CONTAINER, FINISH_APPLICATION, APPLICATION_CONTAINER_FINISHED, APPLICATION_LOG_HANDLING_FAILED, while unimportant events could include: APPLICATION_INITED, APPLICATION_RESOURCES_CLEANEDUP, APPLICATION_LOG_HANDLING_INITED, APPLICATION_LOG_HANDLING_FINISHED, etc. 2. We should have some handy method to turn these app/container events to TimelineEvent and publish these events in a consensus way rather than publish one type of event with one method. 3. We don't need to create new container events but should log existing YARN app/container events that happen in NM. If we really think some important events are missing in YARN, we can have futher discussions later after timeline service v2 in good shape. 4. It looks like NMTimelinePublisher should be used by ContainerManager, Container, ResourceLocalizationService and Log Handler. Move it to NMContext should be convenient to use for other components. 5. Container Resource Usage event may not be necessary given we already have metrics update and will do aggregation according to metrics update. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639668#comment-14639668 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp] [~sjlee0], Based on yesterday's discussion , IIUC no need to add any further stuff as we are already publishing the values based on the timestamp when its monitored. So do you need me to add anything as part of this initial jira ? Events related to other information(localization failures etc...) related to a container and also about the priority of the events... i can open up a new jira [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636089#comment-14636089 ] Naganarasimha G R commented on YARN-3045: - Test case failures and white space issue is not related to this patch. White space issue will try to rectify it with other review comments if any and for test case failure have raised YARN-3941, So either [~djp] or [~sjlee0] can further review this jira [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635152#comment-14635152 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 56s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 3s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 26s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 8m 8s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 6m 6s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 53m 7s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12746328/YARN-3045-YARN-2928.006.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / eb1932d | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8594/artifact/patchprocess/whitespace.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8594/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8594/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8594/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8594/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628072#comment-14628072 ] Varun Saxena commented on YARN-3045: [~Naganarasimha], a couple of comments. # In NMTimelinePublisher, we can make Container***Event classes as private. I do not see them being referenced anywhere else # Will a single event queue with a single event handling thread in async dispatcher be enough to handle container events ? I think they may be too many. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628077#comment-14628077 ] Varun Saxena commented on YARN-3045: A cosmetic comment. Some of the lines are too long ( 80 chars). [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624472#comment-14624472 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp], Sorry for the delayed response! and some points to discuss for your queries as follows : bq why we hook the track of container start event in ContainerManagerImpl, but for container finished event, we do it inside of ContainerImpl? We should try to keep NMTimelinePublisher get referenced in one place if no necessary for other places. This was intentionally done to avoid resending of timelineevents during recovery. In RM's case also it was happening(which is being handled in YARN-3127) hence to avoid duplicate events have kept it there. If any better ways to avoid, i am open for it . Other comments will take care, some of it are due to missing to revert the code while testing ... [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619400#comment-14619400 ] Naganarasimha G R commented on YARN-3045: - +1, This seems to be a good idea for having priority in events... [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616763#comment-14616763 ] Junping Du commented on YARN-3045: -- bq. IMO, the latter one sounds like a better choice because we can track more type of container events (like: ContainerResourceLocalizedEvent, ContainerResourceFailedEvent, etc.) during container state transition that we are currently missing. In addition, we should have a selective mechanism (similar to log level) to put different kind of container events with different severity so user can choose (through configuration) to push important container events (and ignore other unimportant events) to backend storage. Thoughts? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615275#comment-14615275 ] Junping Du commented on YARN-3045: -- Thanks [~Naganarasimha] for updating the patch! One of my major question on 005 patch is: why we hook the track of container start event in ContainerManagerImpl, but for container finished event, we do it inside of ContainerImpl? We should try to keep NMTimelinePublisher get referenced in one place if no necessary for other places. IMO, the latter one sounds like a better choice because we can track more type of container events (like: ContainerResourceLocalizedEvent, ContainerResourceFailedEvent, etc.) during container state transition that we are currently missing. Other minor comments: In TestDistributedShell.java, {code} - @Test(timeout=9) + //@Test(timeout=9) {code} Why do we need to comment this out? We should add back the timeout value here if no special reason. In ContainerManagerImpl.java, {code} + private NMTimelinePublisher nmMetricsPublisher; {code} We should mark it as final which shouldn't get changed during life cycle of ContainerManager. The same case for ContainerImpl.java also. {code} +container +.getNMTimelinePublisher() +.reportContainerResourceUsage( +container, +currentTime, +pId, +(currentPmemUsage == ResourceCalculatorProcessTree.UNAVAILABLE) ? null +: currentPmemUsage, +(cpuUsageTotalCoresPercentage == ResourceCalculatorProcessTree.UNAVAILABLE) ? null +: cpuUsageTotalCoresPercentage); {code} No need to transform unavailable value from ResourceCalculatorProcessTree.UNAVAILABLE to null, as we can check value of unavailable instead of null later. In NMTimelinePublisher.java, {code} + protected void handleSystemMetricsEvent(NMTimelineEvent event) { +switch (event.getType()) { +case CONTAINER_CREATED: {code} Indentation between switch and case. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614346#comment-14614346 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp], [~sjlee0] [~zjshen] The white space reported for the patch is not related to my modifications but would get it corrected along with the other review comments (if any). I think it can be reviewed further now. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613995#comment-14613995 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 39s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 4s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 0s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 3m 7s | Tests passed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 5s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 47m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12743596/YARN-3045-YARN-2928.005.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 18c4859 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8425/artifact/patchprocess/whitespace.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8425/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8425/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8425/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8425/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602627#comment-14602627 ] Naganarasimha G R commented on YARN-3045: - thanks for reviewing [~djp] [~sjlee0], sorry for the late response as i was lil held up. thanks for confirming consolidation [~djp], will try to get that done by next patch. bq. if need separated event queue later to make sure container metrics boom Already i have created a async dispatcher for timeline publishing if req we can create another dispatcher for container metrics only. this is what you meant? bq. For corner case that NM publisher delay too long time (queue is busy) to publish event, it still get chance to fail (very low chance should be acceptable here). Ok, will leave The lifecycle management of app collector out of this jira. may be we can handle them (including multiple attempt as specified [~sangjin) in another jira. bq. APPLICATION_CREATED_EVENT might be seeing the race condition Yes there seems to be another race condition but this time not with src and the test but within the src. {quote} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:276) {quote} Had seen this only once earlier but was not able to get the logs now i can analyze further on this. bq. I'm a bit puzzled by the hashCode override; is it necessary? My mistake i think its resudual code of initial version, which may be i have added while trying out MultiAsync dispatcher and events of one app needs to go to one handler. but not required any more, will remove it. Will take care of other [~sjlee0] comments and will try to provide the patch at the earliest [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598054#comment-14598054 ] Sangjin Lee commented on YARN-3045: --- {quote} The lifecycle management of app collector is a little tricky here: it get registered when the first container (AM) get launched, but should not unregistered immediately when AM container get stop. May be wait for application finish event comes to NM should work for most cases. For corner case that NM publisher delay too long time (queue is busy) to publish event, it still get chance to fail (very low chance should be acceptable here). Later, we will run to similar issue again when we are doing app level aggregation in app collector that the aggregation process could still be running. In any case, we should pay special attention to lifecycle management for collector - we have a separated JIRA to move it out of auxiliary service. I think we can discuss more on this together with/in that JIRA. {quote} It's a good point. I think some amount of linger after the AM container is completed should be a fine solution. Note that not only the collector needs to be up but also the mapping should not be removed from the RM for this to work. As [~djp] pointed out, having multiple app attempts (AMs) is another case. Perhaps the same linger can apply in that case so that the collector can stick around to handle some writes until the next collector that belongs to the next AM comes online and registers itself. We need to hash out the details of multiple AMs scenario, preferably in a different JIRA. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598012#comment-14598012 ] Junping Du commented on YARN-3045: -- Thanks [~Naganarasimha] for updating the patch! Looking into it now, some comments will be after. Some quickly thoughts on your question above. bq. I prefer to have all the container related events and entities to be published by NMTimelinePublisher, so wanted push container usage metrics also to NMTimelinePublisher. This will ensure all NM timeline stuff are put in one place and remove thread pool handling in ContainerMonitorImpl. I am generally fine for consolidating the publishment of events and metrics with NMTimelinePublisher. However, we may check if need separated event queue later to make sure container metrics boom up won't affect events get published. bq. When the AM container finishes and removes the collector for the app, still there is possibility that all the events published for the app by the current NM and other NM are still in pipeline, so was wondering whether we can have timer task which periodically cleans up collector after some period and not imm remove it when AM container is finished. The lifecycle management of app collector is a little tricky here: it get registered when the first container (AM) get launched, but should not unregistered immediately when AM container get stop. May be wait for application finish event comes to NM should work for most cases. For corner case that NM publisher delay too long time (queue is busy) to publish event, it still get chance to fail (very low chance should be acceptable here). Later, we will run to similar issue again when we are doing app level aggregation in app collector that the aggregation process could still be running. In any case, we should pay special attention to lifecycle management for collector - we have a separated JIRA to move it out of auxiliary service. I think we can discuss more on this together with/in that JIRA. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598313#comment-14598313 ] Sangjin Lee commented on YARN-3045: --- I took a quick pass at the latest patch. First, could you look at the checkstyle issue and the unit test failure? I think the unit test failure is an existing issue, but since you looked at it for YARN-3792, it'd be great if you could take another look. It looks like even the APPLICATION_CREATED_EVENT might be seeing the race condition? (NMTimelinePublisher.java) - I'm not 100% clear about the naming convention, but I was under the impression that we're sticking with the name timelineservice as the package name? Is it not the case? - l.223: minor nit, but let's make inner classes static unless they need to be non-static - l.252: I'm a bit puzzled by the hashCode override; is it necessary? If so, then we should also override equals. And also, why is it going by only on the app id? - l.296: the same question here [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597488#comment-14597488 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 48s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 58s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 49s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 41s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 9m 17s | Tests failed in hadoop-yarn-applications-distributedshell. | | {color:green}+1{color} | yarn tests | 6m 10s | Tests passed in hadoop-yarn-server-nodemanager. | | | | 54m 27s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740912/YARN-3045-YARN-2928.004.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / 84f37f1 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8325/artifact/patchprocess/whitespace.txt | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8325/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8325/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8325/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8325/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595046#comment-14595046 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp],[~sjlee0] [~zjshen], Please find the attached patch with rebasing on top of 3792. I would like to discuss regarding 2 points # I prefer to have all the container related events and entities to be published by NMTimelinePublisher, so wanted push container usage metrics also to NMTimelinePublisher. This will ensure all NM timeline stuff are put in one place and remove thread pool handling in {{ContainerMonitorImpl}} (Though later point will not be a issue when YARN-3367 is handled but due to the former reason i would prefer to move) # While testing in TestDistributedShell found out that few of the container metrics events were failing as there will be race condition. When the AM container finishes and removes the collector for the app, still there is possibility that all the events published for the app by the current NM and other NM are still in pipeline, so was wondering whether we can have timer task which periodically cleans up collector after some period and not imm remove it when AM container is finished. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591439#comment-14591439 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp], Well the modifications/fixes for TestDistributed shell got further moved to YARN-3792, but review is almost done for it by [~sjlee0]. will parallely work on both and get it done ASAP. If you have bandwidth would you also take a look at YARN-3792 as most of my changes there are on top ur work ? Also for this jira even Vinod earlier [opined|https://issues.apache.org/jira/browse/YARN-3045?focusedCommentId=14520929page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14520929] for moving the container usage metrics to NMTimelinePublisher, so wanted know your view on the same ? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590914#comment-14590914 ] Junping Du commented on YARN-3045: -- Hi [~Naganarasimha], given YARN-3044 is already get in, mind to update patch here? Thx! [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566132#comment-14566132 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 32s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 6 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 2s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 59s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 7m 2s | Tests failed in hadoop-yarn-applications-distributedshell. | | {color:red}-1{color} | yarn tests | 5m 59s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 51m 8s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-applications-distributedshell | | Failed unit tests | hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels | | | hadoop.yarn.applications.distributedshell.TestDistributedShell | | | hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736354/YARN-3045-YARN-2928.003.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-2928 / a9738ceb | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8140/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html | | hadoop-yarn-applications-distributedshell test log | https://builds.apache.org/job/PreCommit-YARN-Build/8140/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8140/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8140/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8140/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Labels: BB2015-05-TBR Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526177#comment-14526177 ] Hadoop QA commented on YARN-3045: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 49s | Pre-patch YARN-2928 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 5 new or modified test files. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 35s | The applied patch generated 105 new checkstyle issues (total was 599, now 703). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 1m 5s | The patch appears to introduce 4 new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | yarn tests | 5m 43s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 41m 59s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-server-nodemanager | | | Boxing/unboxing to parse a primitive org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(NMTokenIdentifier, ContainerTokenIdentifier, StartContainerRequest) At ContainerManagerImpl.java:org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainerInternal(NMTokenIdentifier, ContainerTokenIdentifier, StartContainerRequest) At ContainerManagerImpl.java:[line 868] | | | Should org.apache.hadoop.yarn.server.nodemanager.timeline.NMTimelinePublisher$ContainerCreatedEvent be a _static_ inner class? At NMTimelinePublisher.java:inner class? At NMTimelinePublisher.java:[lines 229-261] | | | Should org.apache.hadoop.yarn.server.nodemanager.timeline.NMTimelinePublisher$ContainerFinishedEvent be a _static_ inner class? At NMTimelinePublisher.java:inner class? At NMTimelinePublisher.java:[lines 274-301] | | | Should org.apache.hadoop.yarn.server.nodemanager.timeline.NMTimelinePublisher$ForwardingEventHandler be a _static_ inner class? At NMTimelinePublisher.java:inner class? At NMTimelinePublisher.java:[lines 206-216] | | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery | | | hadoop.yarn.server.nodemanager.containermanager.application.TestApplication | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12730073/YARN-3045-YARN-2928.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / b689f5d | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/7680/artifact/patchprocess/diffcheckstylehadoop-yarn-server-nodemanager.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/7680/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/7680/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/7680/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/7680/console | This message was automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520929#comment-14520929 ] Vinod Kumar Vavilapalli commented on YARN-3045: --- bq. IIUC from discussions in 3044, as suggested by Junping Du we have container metrics published as optional feature in RM side and have also coded accordingly, and even Sangjin Lee recent comment was inline with this idea. Hence earlier added some code in NM side to get all the required information about container which are currently got it in RM. Good to know. Will look at that JIRA. My point is we need to do it only in one place, either RM or NM. On first reaction, writing from NMs is more distributed, but we'll see. bq. Gather all timeline publishing code in one place and have a single async dispatcher taking caring of publishing timeline entities/events. I am inclined towards this approach. Kind of similar to what happens in MapReduce AM. There is no point in spraying TimelineClient all over the NM? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520332#comment-14520332 ] Sangjin Lee commented on YARN-3045: --- Hi [~Naganarasimha], I do have one quick question on the naming. I see a lot of names that include metrics, such as NMMetricsPublisher, NMMetricsEvent, NMMetricsEventType, and so on. And yet, they don't seem to involve metrics in the sense of timeline metrics. This is a source of confusion to me. Do we need metrics in these? They seem to be capturing purely lifecycle events. Could we change them to better names? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520902#comment-14520902 ] Naganarasimha G R commented on YARN-3045: - Thanks for the reviews [~sjlee0] [~vinodkv] +1 for {{NMTimelinePublisher, NMTimelineEvent, NMTimelineEventType}}. bq. The allocated memory/vcore/host/port/http-port/priority information per container are also getting captured by the ResourceManager? Similarly the exit-status, diagnostics etc. We definitely don't need these in two places. IIUC from discussions in 3044, as suggested by [~djp] we have container metrics published as optional feature in RM side and have also coded accordingly, and even [~sjlee0] recent [comment|https://issues.apache.org/jira/browse/YARN-3044?focusedCommentId=14518761page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14518761] was inline with this idea. Hence earlier added some code in NM side to get all the required information about container which are currently got it in RM. bq. YARN-3334 added the notion of a timelineClient object embedded inside Nodemanager's ApplicationImpl - we need to reconcile these things with what you are doing here. Well had some offline discussion with [~djp] regarding this. Currently [~djp] had life cycle of timeline client and some code in NMContainerMonitor to publish container resource metrics. So possible approaches here are : # For other life cycle events, have inline code in transitions to get hold of timelineclient from context and publish the timeline entity/event # Gather all timeline publishing code in one place and have a single async dispatcher taking caring of publishing timeline entities/events. Advantage of later approach is no need to maintain thread pool (introduced as part of YARN-3334). in each place where we want to publish timeline entities. And felt it will be more organised if in one place all timeline publishing code like RM. [~djp]'s point was ??Using AsyncDispatcher is also a good way. However, I think this is just a temporarily solution before YARN-3367 where we will do non-blocking in TimelineClient later. So please feel free to choose AsyncDispatcher or thread pool - whatever works end-2-end in current phase. But I would recommend not trying to replace existing thread pool with AsynDispatcher - no need to waste time to fix something which will get removed later.?? So little confused with the scope of this jira if we conclude it then can come up with the updated patch. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520782#comment-14520782 ] Naganarasimha G R commented on YARN-3045: - Hi [~sjlee0], I have modeled it similarly to the RM side's SystemMetricsPublisher, and initially even i was little skeptical with the name having Metrics here. Probably i might rename {{NMMetricsPublisher, NMMetricsEvent, NMMetricsEventType, and so on}} as {{NMLifeCycleEntitiesPublisher, NMLifeCycleEvent, NMLifeCycleEventType , and so on}} I am bad with naming classes, expecting some good names for the NM side :). [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520810#comment-14520810 ] Sangjin Lee commented on YARN-3045: --- I'm not good at it either. :) My basic idea is, if it is not limited to metrics or events (i.e. could be used for both), then we could pick a generic timeline name. For example, NMTimelinePublisher. On the other hand, if it is specific to events, then we could say NMTimelineEventsPublisher. The word lifecycle also feels unnecessarily verbose to me, so I'm kind of -1 on it. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520843#comment-14520843 ] Vinod Kumar Vavilapalli commented on YARN-3045: --- The allocated memory/vcore/host/port/http-port/priority information per container are also getting captured by the ResourceManager? Similarly the exit-status, diagnostics etc. We definitely don't need these in two places. Also, YARN-3334 added the notion of a timelineClient object embedded inside Nodemanager's ApplicationImpl - we need to reconcile these things with what you are doing here. /cc [~djp] [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503429#comment-14503429 ] Hadoop QA commented on YARN-3045: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726626/YARN-3045.20150420-1.patch against trunk revision f967fd2. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7403//console This message is automatically generated. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481412#comment-14481412 ] Naganarasimha G R commented on YARN-3045: - [~djp], As YARN-3334 is in, can I start with this jira ? [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481410#comment-14481410 ] Zhijie Shen commented on YARN-3045: --- Container metrics publishing has been completed in YARN-3334, please continue the work around NM lifecycle events here. Change the title accordingly. [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481425#comment-14481425 ] Naganarasimha G R commented on YARN-3045: - :) parallely had commented ... Will start working on this! [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481424#comment-14481424 ] Naganarasimha G R commented on YARN-3045: - :) parallely had commented ... Will start working on this! [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)