[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394149#comment-14394149 ] Naganarasimha G R commented on YARN-2729: - bq. Revisted interval, I think it's better to make it to be provider configuration instead of script-provider-only configuration. Since config/script will share it (I remember I have some back-and-forth opinions here). :) agree, i dont mind redoing, as long as its for better reason and i was expecting for changes here anyway. For other comments on configuration will get it done, bq. I feel like ScriptBased and ConfigBased can share some implementations, they will all init a time task, get interval and run, check timeout (meaningless for config-based), etc. Can you make an abstract class and inherited by ScriptBased? I can do this (which i feel is correct), but if we do this then it might not be possible to generalize much NodeHealthSCriptRunner and ScriptBasedNodeLabelsProvider, which i feel should be ok bq. checkAndThrowLabelName should be called in NodeStatusUpdaterImpl In a way it would be better in NodeStatusUpdaterImpl as we support external class to be a provider, but earlier thought it would not be good for additional checks as part of heart beat flow bq. label need to be trim() when called checkAndThrowLabelName(...) Not required as checkAndThrowLabelName takes care of it, but missing test case will add it for NodeStatusUpdaterImpl Other issues will rework in next patch > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, > YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, > YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, > YARN-2729.20150402-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3334: - Attachment: YARN-3334-v8.patch Upload v8 patch to address minor comments for log in TimelineClientImpl. > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394059#comment-14394059 ] Naganarasimha G R commented on YARN-3390: - Thanks for the feedback [~zjshen] & [~sjlee0], bq. either pass in the context per call or have a map of app id to context. I would favor the latter approach because it'd be easier on the perspective of callers of putEntities(). I too agree it will be easier easier on the perspective of callers of putEntities() but if we favor for map of {{app id to context}} * implicit assumption would be that {{putEntities(TimelineEntities ) }} will be for same appId(/will have have the same context) * TimelineEntities as such do not have appID explicitly, so planning to modify {{TimelineCollector.getTimelineEntityContext()}} to {{TimelineCollector.getTimelineEntityContext(TimelineEntity.Identifier id)}} and subclasses of TimelineCollector can take care of mapping the Id to the Context (via AppId) if required. * code of {{putEntities(TimelineEntities)}} would look something like {code} Iterator iterator = entities.getEntities().iterator(); TimelineEntity next = (iterator.hasNext())?iterator.next():null; if(null!=next) { TimelineCollectorContext context = getTimelineEntityContext(next.getIdentifier()); return writer.write(context.getClusterId(), context.getUserId(), context.getFlowId(), context.getFlowRunId(), context.getAppId(), entities); } {code} If its ok then shall i work on it ? > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394052#comment-14394052 ] Junping Du commented on YARN-3334: -- Thanks [~zjshen] and [~sjlee0] for comments! bq. If so, I suggest combining the two massages together, and record a error-level log (the first message is actually useless, if we always report the second one). That sounds OK. Will update a quick fix. bq. However, I do worry about the size of the map produced in the response in ResourceTrackerService. It can be potentially quite large every time and has a potential impact on many things as it is part of the NM heartbeat handling. It's OK for now, but we should try to address it sooner than later. Just filed YARN-3445 to track this issue. This is also needed in gracefully decommission (YARN-914) - decommissioning node can be terminated earlier by RM if no running apps. > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3445) NM notify RM on running Apps in NM-RM heartbeat
Junping Du created YARN-3445: Summary: NM notify RM on running Apps in NM-RM heartbeat Key: YARN-3445 URL: https://issues.apache.org/jira/browse/YARN-3445 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Affects Versions: 2.7.0 Reporter: Junping Du Assignee: Junping Du Per discussion in YARN-3334, we need filter out unnecessary collectors info from RM in heartbeat response. Our propose is to add additional field for running apps in NM heartbeat request, so RM only send collectors for local running apps back. This is also needed in YARN-914 (graceful decommission) that if no running apps in NM which is in decommissioning stage, it will get decommissioned immediately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394003#comment-14394003 ] Hadoop QA commented on YARN-3443: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709150/YARN-3443.001.patch against trunk revision bad070f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1150 javac compiler warnings (more than the trunk's current 1148 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7210//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7210//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7210//console This message is automatically generated. > Create a 'ResourceHandler' subsystem to ease addition of support for new > resource types on the NM > - > > Key: YARN-3443 > URL: https://issues.apache.org/jira/browse/YARN-3443 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3443.001.patch > > > The current cgroups implementation is closely tied to supporting CPU as a > resource . We need to separate out CGroups support as well a provide a simple > ResourceHandler subsystem that will enable us to add support for new resource > types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
[ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393976#comment-14393976 ] Tsuyoshi Ozawa commented on YARN-2666: -- OK, I'll check it. > TestFairScheduler.testContinuousScheduling fails Intermittently > --- > > Key: YARN-2666 > URL: https://issues.apache.org/jira/browse/YARN-2666 > Project: Hadoop YARN > Issue Type: Test > Components: scheduler >Reporter: Tsuyoshi Ozawa >Assignee: zhihai xu > Attachments: YARN-2666.000.patch > > > The test fails on trunk. > {code} > Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.582 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3435) AM container to be allocated Appattempt AM container shown as null
[ https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393975#comment-14393975 ] Hadoop QA commented on YARN-3435: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709003/YARN-3435.001.patch against trunk revision bad070f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7208//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7208//console This message is automatically generated. > AM container to be allocated Appattempt AM container shown as null > -- > > Key: YARN-3435 > URL: https://issues.apache.org/jira/browse/YARN-3435 > Project: Hadoop YARN > Issue Type: Bug > Environment: 1RM,1DN >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Attachments: Screenshot.png, YARN-3435.001.patch > > > Submit yarn application > Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 > Before the AM container is allocated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393971#comment-14393971 ] Sidharta Seethana commented on YARN-3366: - Since this patch requires uncommitted changes from https://issues.apache.org/jira/browse/YARN-3443, I am not submitting this patch to a pre-commit build for the time being. > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3366: Attachment: YARN-3366.001.patch Attaching a patch with an implementation of traffic classification/shaping for traffic originating from YARN containers. This patch depends on changes/patches from https://issues.apache.org/jira/browse/YARN-3365 and https://issues.apache.org/jira/browse/YARN-3443 > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3443: Attachment: YARN-3443.001.patch Attaching patch that 1) separates out CGroup implementation into a reusable class 2) creates 'PrivilegedContainerExecutor' that wraps the container-executor binary that can be used for operations that require elevated privileges 3) creates a simple ResourceHandler interface for that be used to plug in support for new resource types. > Create a 'ResourceHandler' subsystem to ease addition of support for new > resource types on the NM > - > > Key: YARN-3443 > URL: https://issues.apache.org/jira/browse/YARN-3443 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3443.001.patch > > > The current cgroups implementation is closely tied to supporting CPU as a > resource . We need to separate out CGroups support as well a provide a simple > ResourceHandler subsystem that will enable us to add support for new resource > types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393938#comment-14393938 ] Sidharta Seethana commented on YARN-2424: - It looks different versions of the patch to fix this were committed to branch-2 and trunk? The corresponding changes to LinuxContainerExecutor.java look different. > LCE should support non-cgroups, non-secure mode > --- > > Key: YARN-2424 > URL: https://issues.apache.org/jira/browse/YARN-2424 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Fix For: 2.6.0 > > Attachments: Y2424-1.patch, YARN-2424.patch > > > After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. > This is a fairly serious regression, as turning on LCE prior to turning on > full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393935#comment-14393935 ] Hadoop QA commented on YARN-3436: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709010/YARN-3436.001.patch against trunk revision bad070f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7209//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7209//console This message is automatically generated. > Doc WebServicesIntro.html Example Rest API url wrong > > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3444) Fixed typo (capability)
[ https://issues.apache.org/jira/browse/YARN-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393861#comment-14393861 ] Gabor Liptak commented on YARN-3444: Pull request at https://github.com/apache/hadoop/pull/15 > Fixed typo (capability) > --- > > Key: YARN-3444 > URL: https://issues.apache.org/jira/browse/YARN-3444 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Reporter: Gabor Liptak >Priority: Minor > > Fixed typo (capability) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3444) Fixed typo (capability)
Gabor Liptak created YARN-3444: -- Summary: Fixed typo (capability) Key: YARN-3444 URL: https://issues.apache.org/jira/browse/YARN-3444 Project: Hadoop YARN Issue Type: Improvement Components: applications/distributed-shell Reporter: Gabor Liptak Priority: Minor Fixed typo (capability) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
Sidharta Seethana created YARN-3443: --- Summary: Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM Key: YARN-3443 URL: https://issues.apache.org/jira/browse/YARN-3443 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Sidharta Seethana Assignee: Sidharta Seethana The current cgroups implementation is closely tied to supporting CPU as a resource . We need to separate out CGroups support as well a provide a simple ResourceHandler subsystem that will enable us to add support for new resource types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393810#comment-14393810 ] Hudson commented on YARN-2901: -- FAILURE: Integrated in Hadoop-trunk-Commit #7501 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7501/]) YARN-2901. Add errors and warning metrics page to RM, NM web UI. (Varun Vasudev via wangda) (wangda: rev bad070fe15a642cc6f3a165612fbd272187e03cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/ErrorsAndWarningsBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/WebServer.java * hadoop-common-project/hadoop-common/src/main/conf/log4j.properties * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/NMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Log4jWarningErrorMetricsAppender.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMErrorsAndWarningsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NavBlock.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLog4jWarningErrorMetricsAppender.java > Add errors and warning metrics page to RM, NM web UI > > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, > apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, > apache-yarn-2901.4.patch, apache-yarn-2901.5.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2901: - Summary: Add errors and warning metrics page to RM, NM web UI (was: Add errors and warning stats to RM, NM web UI) > Add errors and warning metrics page to RM, NM web UI > > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, > apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, > apache-yarn-2901.4.patch, apache-yarn-2901.5.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393789#comment-14393789 ] Sidharta Seethana commented on YARN-3365: - Actually, never mind - it seems like the banned user list wasn't affected. -Sid > Add support for using the 'tc' tool via container-executor > -- > > Key: YARN-3365 > URL: https://issues.apache.org/jira/browse/YARN-3365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Fix For: 2.8.0 > > Attachments: YARN-3365.001.patch, YARN-3365.002.patch, > YARN-3365.003.patch > > > We need the following functionality : > 1) modify network interface traffic shaping rules - to be able to attach a > qdisc, create child classes etc > 2) read existing rules in place > 3) read stats for the various classes > Using tc requires elevated privileges - hence this functionality is to be > made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2942: Attachment: ConcatableAggregatedLogsProposal_v5.pdf I've uploaded a v5 doc which address those changes. I also clarified a few other things in there too. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, > ConcatableAggregatedLogsProposal_v5.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393775#comment-14393775 ] Sidharta Seethana commented on YARN-3365: - Thanks, Vinod! we'll need a small patch to undo the banned users change in branch-2. > Add support for using the 'tc' tool via container-executor > -- > > Key: YARN-3365 > URL: https://issues.apache.org/jira/browse/YARN-3365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Fix For: 2.8.0 > > Attachments: YARN-3365.001.patch, YARN-3365.002.patch, > YARN-3365.003.patch > > > We need the following functionality : > 1) modify network interface traffic shaping rules - to be able to attach a > qdisc, create child classes etc > 2) read existing rules in place > 3) read stats for the various classes > Using tc requires elevated privileges - hence this functionality is to be > made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393773#comment-14393773 ] Hudson commented on YARN-3365: -- FAILURE: Integrated in Hadoop-trunk-Commit #7500 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7500/]) YARN-3365. Enhanced NodeManager to support using the 'tc' tool via container-executor for outbound network traffic control. Contributed by Sidharta Seethana. (vinodkv: rev b21c72777ae664b08fd1a93b4f88fa43f2478d94) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java > Add support for using the 'tc' tool via container-executor > -- > > Key: YARN-3365 > URL: https://issues.apache.org/jira/browse/YARN-3365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Fix For: 2.8.0 > > Attachments: YARN-3365.001.patch, YARN-3365.002.patch, > YARN-3365.003.patch > > > We need the following functionality : > 1) modify network interface traffic shaping rules - to be able to attach a > qdisc, create child classes etc > 2) read existing rules in place > 3) read stats for the various classes > Using tc requires elevated privileges - hence this functionality is to be > made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3365: -- Fix Version/s: 2.8.0 > Add support for using the 'tc' tool via container-executor > -- > > Key: YARN-3365 > URL: https://issues.apache.org/jira/browse/YARN-3365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Fix For: 2.8.0 > > Attachments: YARN-3365.001.patch, YARN-3365.002.patch, > YARN-3365.003.patch > > > We need the following functionality : > 1) modify network interface traffic shaping rules - to be able to attach a > qdisc, create child classes etc > 2) read existing rules in place > 3) read stats for the various classes > Using tc requires elevated privileges - hence this functionality is to be > made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393750#comment-14393750 ] Sangjin Lee commented on YARN-3051: --- bq. To plot graphs based on timeseries data, we may need to provide a time window for metrics too. This would be useful in case of getEntity() API. So do we specify this time window separately for each metric to be retrieved or same for all metrics ? My sense is that it should be fine to use the same time window for all metrics. [~gtCarrera9]? [~zjshen]? bq. Queries based on relations i.e. queries such as get all containers for an app. We can return relatesto field while querying for an app. And then client can use this result to fetch detailed info about related entities. Is that fine ? Or we have to be handle it as part of a single query ? For now, let's assume 2 queries from the client side. My thinking was that this is an optimization. If the storage can return two levels of entities efficiently, we could potentially exploit it. But maybe that's nice to have at the moment. bq. Some understanding on how flow id, flow run id will be stored is required. Li just posted the schema design in YARN-3134. That should be helpful. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
[ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393743#comment-14393743 ] zhihai xu commented on YARN-2666: - Hi [~ozawa], I rebased the patch YARN-2666.000.patch rebased on the latest code base and it passed the Jenkins test. Do you have time to review/commit the patch? many thanks > TestFairScheduler.testContinuousScheduling fails Intermittently > --- > > Key: YARN-2666 > URL: https://issues.apache.org/jira/browse/YARN-2666 > Project: Hadoop YARN > Issue Type: Test > Components: scheduler >Reporter: Tsuyoshi Ozawa >Assignee: zhihai xu > Attachments: YARN-2666.000.patch > > > The test fails on trunk. > {code} > Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.582 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-685) Capacity Scheduler is not distributing the reducers tasks across the cluster
[ https://issues.apache.org/jira/browse/YARN-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan resolved YARN-685. - Resolution: Invalid According to test result from [~raviprak], CS fairly distributes reducers to NMs in the cluster. Resolving this as invalid and please reopen this if you still think this is a problem. > Capacity Scheduler is not distributing the reducers tasks across the cluster > > > Key: YARN-685 > URL: https://issues.apache.org/jira/browse/YARN-685 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.0.4-alpha >Reporter: Devaraj K > > If we have reducers whose total memory required to complete is less than the > total cluster memory, it is not assigning the reducers to all the nodes > uniformly(~uniformly). Also at that time there are no other jobs or job tasks > running in the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
[ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393725#comment-14393725 ] Hadoop QA commented on YARN-2666: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709083/YARN-2666.000.patch against trunk revision 6a6a59d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7207//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7207//console This message is automatically generated. > TestFairScheduler.testContinuousScheduling fails Intermittently > --- > > Key: YARN-2666 > URL: https://issues.apache.org/jira/browse/YARN-2666 > Project: Hadoop YARN > Issue Type: Test > Components: scheduler >Reporter: Tsuyoshi Ozawa >Assignee: zhihai xu > Attachments: YARN-2666.000.patch > > > The test fails on trunk. > {code} > Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.582 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393723#comment-14393723 ] Sangjin Lee commented on YARN-3334: --- I took a quick look at the latest patch, and it looks good for the most part. However, I do worry about the size of the map produced in the response in ResourceTrackerService. It can be potentially quite large every time and has a potential impact on many things as it is part of the NM heartbeat handling. It's OK for now, but we should try to address it sooner than later. > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393717#comment-14393717 ] Robert Kanter commented on YARN-2942: - Yes, it does a blocking wait. I think this will end up being in a separate thread anyway because it's being done after uploading the logs to HDFS. However, I think making it a separate service is a good idea anyway. As you said, this handles NM restart, and allows us to later add more flexibility. If you upgrade the JHS before the NM, it's not the end of the world. New logs wouldn't be found by the JHS, but that only hurts users trying to view those logs through the JHS. Once the JHS is updated, they would be viewable. In any case, having the two configs is probably more confusing than it needs to be for the user, and we'd have to take care of the case where the new format is disabled but concatenation is enabled (which is invalid). I think we should just make this one config: the new format and concatenation is enabled or neither is. I'll post an updated doc shortly. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393711#comment-14393711 ] Wangda Tan commented on YARN-3434: -- [~tgraves], I feel like this issue and several related issues are solved by YARN-3243 already. Could you please check if this problem is already solved? Thanks, > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393700#comment-14393700 ] Karthik Kambatla commented on YARN-2942: (Canceled the patch to stop Jenkins from evaluating the design doc :) ) [~rkanter] - thanks for updating the design doc. A couple of comments: # If there is an NM X actively concatenating its logs and NM Y can't acquire the lock, what happens? ## Does it do a blocking-wait? If yes, this should likely be in a separate thread. ## I would like for it to be non-blocking. How about a LogConcatenationService in the NM? This service is brought up if you enable log concatenation. This service would periodically go through all of its past aggregated logs and concatenate those that it can acquire a lock for. Delayed concatenation should be okay because we are doing this primarily to handle the problem HDFS has with small files. Also, this way, we don't have do anything different for NM restart. Forward looking, this concat service could potentially take input on how busy HDFS is. # I didn't completely understand the point about a config to specify the format. Are you suggesting we have two different on/off configs - one to turn on concatenation and one to specify the format JHS should be reading. I think just one config that clearly states that the turning on this on an NM (writer) requires the JHS (reader) already has this enabled. In case of rolling upgrades, this translates to requiring a JHS upgrade prior to NM upgrade. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend
[ https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3134: Attachment: YARN-3134DataSchema.pdf After some community discussion we're finalizing the Phoenix data schema design for the very first phase. In this phase we focus on storing basic entities and their metrics, configs, and events. The attached document is a summary of our discussion results. Comments are more than welcome. > [Storage implementation] Exploiting the option of using Phoenix to access > HBase backend > --- > > Key: YARN-3134 > URL: https://issues.apache.org/jira/browse/YARN-3134 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3134DataSchema.pdf > > > Quote the introduction on Phoenix web page: > {code} > Apache Phoenix is a relational database layer over HBase delivered as a > client-embedded JDBC driver targeting low latency queries over HBase data. > Apache Phoenix takes your SQL query, compiles it into a series of HBase > scans, and orchestrates the running of those scans to produce regular JDBC > result sets. The table metadata is stored in an HBase table and versioned, > such that snapshot queries over prior versions will automatically use the > correct schema. Direct use of the HBase API, along with coprocessors and > custom filters, results in performance on the order of milliseconds for small > queries, or seconds for tens of millions of rows. > {code} > It may simply our implementation read/write data from/to HBase, and can > easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393645#comment-14393645 ] Sangjin Lee commented on YARN-3391: --- I am fine with tabling this discussion and revisiting it later in the interest of making progress. I just wanted to add my 2 cents that this is something we already see and experience with hRaven so it's not theoretical. That's the context from our side. The way I see it is that apps that do not have the flow name are basically a degenerate case of a single-app flow. This is unrelated to the app-to-flow aggregation. It has to do with the flowRun-to-flow aggregation. And it's something we want the users to do when they can set the flow name. FWIW... > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393643#comment-14393643 ] Zhijie Shen commented on YARN-3390: --- bq. I would favor the latter approach +1 > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393631#comment-14393631 ] Zhijie Shen commented on YARN-3334: --- If so, I suggest combining the two massages together, and record a error-level log (the first message is actually useless, if we always report the second one). > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393615#comment-14393615 ] Sangjin Lee commented on YARN-3390: --- I think we need to either pass in the context per call or have a map of app id to context. I would favor the latter approach because it'd be easier on the perspective of callers of putEntities(). > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393587#comment-14393587 ] Craig Welch commented on YARN-3293: --- General - it looks like the counters could possibly overflow and provide negative values, perhaps this is not something which could possibly happen in the lifetime of a cluster, but a large long-running cluster, is it a possiblilty/concern? This presently looks to be capasched only, had a suggestion to make slightly more general below, [~vinodkv] also mentioned "not specific to scheduler", perhaps it's fine to go capasched only for the first iteration, but wanted to verify (perhaps we need a followon jira for other schedulers). on the web page It's a nit, but I find I don't like the look of the / between the counter and the resource expression where that occurs, maybe - instead of / for those (allocations/reservations/releases)? TestSchedulerHealth can we import Nodemanager & get rid of package references in code CapacitySchedulerHealthInfo looks like there is no need to keep a reference to the CapacityScheduler instance after construction, can we drop it from being a member then? looks like line changes in info log are just whitespace, can you drop them? LeafQueue L884 looks to be just whitespace, can you revert? CSAssignment I think that there should be a new, gsharable between schedulers class which incorporates all the new assignment info and that it should be a member of CSAssignment, instead of adding all of the details directly to CSAssignment. You would still pack the info into CSAssignment (as an instance of that type), but now would take a form that can be shared across schedulers > Track and display capacity scheduler health metrics in web UI > - > > Key: YARN-3293 > URL: https://issues.apache.org/jira/browse/YARN-3293 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, > apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch > > > It would be good to display metrics that let users know about the health of > the capacity scheduler in the web UI. Today it is hard to get an idea if the > capacity scheduler is functioning correctly. Metrics such as the time for the > last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3437: - Target Version/s: YARN-2928 > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3410) YARN admin should be able to remove individual application records from RMStateStore
[ https://issues.apache.org/jira/browse/YARN-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393563#comment-14393563 ] Wangda Tan commented on YARN-3410: -- Thanks for your comment, [~rohithsharma]. But what's the use case of using rmadmin removing a state while RM is running? The command is just a way to avoid app entered an un-expected state so RM cannot get started, unless there's any use case of doing that, I suggest to scope this to a RM starting option like YARN-2131. > YARN admin should be able to remove individual application records from > RMStateStore > > > Key: YARN-3410 > URL: https://issues.apache.org/jira/browse/YARN-3410 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, yarn >Reporter: Wangda Tan >Assignee: Rohith >Priority: Critical > > When RM state store entered an unexpected state, one example is YARN-2340, > when an attempt is not in final state but app already completed, RM can never > get up unless format RMStateStore. > I think we should support remove individual application records from > RMStateStore to unblock RM admin make choice of either waiting for a fix or > format state store. > In addition, RM should be able to report all fatal errors (which will > shutdown RM) when doing app recovery, this can save admin some time to remove > apps in bad state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393556#comment-14393556 ] Hadoop QA commented on YARN-2729: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12708788/YARN-2729.20150402-1.patch against trunk revision 6a6a59d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7205//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7205//console This message is automatically generated. > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, > YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, > YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, > YARN-2729.20150402-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393554#comment-14393554 ] Hadoop QA commented on YARN-3437: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709078/YARN-3437.001.patch against trunk revision 6a6a59d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7206//console This message is automatically generated. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393545#comment-14393545 ] Wangda Tan commented on YARN-2729: -- Some comments: *1) Configuration:* Instead of distributed_node_labels_prefix, do you think is it better to name it : "yarn.node-labels.nm.provider"? The "distributed.node-labels-provider" doesn't clearly mentioned it runs in NM side. I don't want to expose class to config unless it is necessary, now we have two options, one is script-based and another is config-based. We can set the two as "white-list", if a given value is not in whitelist, trying to get a class from the name. So the option will be: yarn.node-labels.nm.provider = "config/script/other-class-name". Revisted interval, I think it's better to make it to be provider configuration instead of script-provider-only configuration. Since config/script will share it (I remember I have some back-and-forth opinions here). If you agree above, the name could be: yarn.node-labels.nm.provider-fetch-interval-ms (and provider-fetch-timeout-ms) And script-related options could be: yarn.node-labels.nm.provider.script.path/opts *2) Implementation of ScriptBasedNodeLabelsProvider* I feel like ScriptBased and ConfigBased can share some implementations, they will all init a time task, get interval and run, check timeout (meaningless for config-based), etc. Can you make an abstract class and inherited by ScriptBased? DISABLE_TIMER_CONFIG should be a part of YarnConfiguration, all default of configurations should be a part of YarnConfiguration. canRun -> something like verifyConfiguredScript, and directly throw exception when something wrong (so that admin can know what really happened, such as file not found, doesn't have execution permission, etc.), and it should be private non-static. checkAndThrowLabelName should be called in NodeStatusUpdaterImpl label need to be trim() when called checkAndThrowLabelName(...) > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, > YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, > YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, > YARN-2729.20150402-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
[ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2666: Attachment: YARN-2666.000.patch > TestFairScheduler.testContinuousScheduling fails Intermittently > --- > > Key: YARN-2666 > URL: https://issues.apache.org/jira/browse/YARN-2666 > Project: Hadoop YARN > Issue Type: Test > Components: scheduler >Reporter: Tsuyoshi Ozawa >Assignee: zhihai xu > Attachments: YARN-2666.000.patch > > > The test fails on trunk. > {code} > Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.582 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2666) TestFairScheduler.testContinuousScheduling fails Intermittently
[ https://issues.apache.org/jira/browse/YARN-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2666: Attachment: (was: YARN-2666.000.patch) > TestFairScheduler.testContinuousScheduling fails Intermittently > --- > > Key: YARN-2666 > URL: https://issues.apache.org/jira/browse/YARN-2666 > Project: Hadoop YARN > Issue Type: Test > Components: scheduler >Reporter: Tsuyoshi Ozawa >Assignee: zhihai xu > > The test fails on trunk. > {code} > Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler > testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler) > Time elapsed: 0.582 sec <<< FAILURE! > java.lang.AssertionError: expected:<2> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393530#comment-14393530 ] Sangjin Lee commented on YARN-3437: --- Added a few folks for review. > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3437) convert load test driver to timeline service v.2
[ https://issues.apache.org/jira/browse/YARN-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3437: -- Attachment: YARN-3437.001.patch Patch v.1 posted. This is basically a modification of the YARN-2556 patch (and clean-up of issues etc.) to work against the timeline service v.2. Since the new distributed timeline service collectors are tied to applications, I chose the approach of instantiating the base timeline collector within the mapper task, rather than going through the timeline client. Making it go through the timeline client has a number of challenges (see YARN-3378). But this should be still effective as a way to exercise the bulk of the write performance and scalability. You can try this out by doing for example {code} hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar timelineperformance -m 10 -t 1000 {code} You'll get the output like {noformat} TRANSACTION RATE (per mapper): 5027.652086 ops/s IO RATE (per mapper): 5027.652086 KB/s TRANSACTION RATE (total): 50276.520865 ops/s IO RATE (total): 50276.520865 KB/s {noformat} It is still using pretty simple entities to write to the storage. I'll work on adding handling job history files later in a different JIRA. I would greatly appreciate your review. Thanks! > convert load test driver to timeline service v.2 > > > Key: YARN-3437 > URL: https://issues.apache.org/jira/browse/YARN-3437 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Attachments: YARN-3437.001.patch > > > This subtask covers the work for converting the proposed patch for the load > test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3365) Add support for using the 'tc' tool via container-executor
[ https://issues.apache.org/jira/browse/YARN-3365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393496#comment-14393496 ] Hadoop QA commented on YARN-3365: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707355/YARN-3365.003.patch against trunk revision 6a6a59d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7203//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7203//console This message is automatically generated. > Add support for using the 'tc' tool via container-executor > -- > > Key: YARN-3365 > URL: https://issues.apache.org/jira/browse/YARN-3365 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3365.001.patch, YARN-3365.002.patch, > YARN-3365.003.patch > > > We need the following functionality : > 1) modify network interface traffic shaping rules - to be able to attach a > qdisc, create child classes etc > 2) read existing rules in place > 3) read stats for the various classes > Using tc requires elevated privileges - hence this functionality is to be > made available via container-executor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393494#comment-14393494 ] Wangda Tan commented on YARN-2901: -- +1 for the patch. Will commit it today if no opposite opinions. > Add errors and warning stats to RM, NM web UI > - > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, > apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, > apache-yarn-2901.4.patch, apache-yarn-2901.5.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393493#comment-14393493 ] Hadoop QA commented on YARN-3388: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709050/YARN-3388-v1.patch against trunk revision eccb7d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.TestRM The following test timeouts occurred in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7201//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7201//console This message is automatically generated. > Allocation in LeafQueue could get stuck because DRF calculator isn't well > supported when computing user-limit > - > > Key: YARN-3388 > URL: https://issues.apache.org/jira/browse/YARN-3388 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch > > > When there are multiple active users in a queue, it should be possible for > those users to make use of capacity up-to max_capacity (or close). The > resources should be fairly distributed among the active users in the queue. > This works pretty well when there is a single resource being scheduled. > However, when there are multiple resources the situation gets more complex > and the current algorithm tends to get stuck at Capacity. > Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393476#comment-14393476 ] zhihai xu commented on YARN-3415: - Thanks [~ragarwal] for valuable feedback and filing this issue. Thanks [~sandyr] for valuable feedback and committing the patch! Greatly appreciated. > Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler > queue > -- > > Key: YARN-3415 > URL: https://issues.apache.org/jira/browse/YARN-3415 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3415.000.patch, YARN-3415.001.patch, > YARN-3415.002.patch > > > We encountered this problem while running a spark cluster. The > amResourceUsage for a queue became artificially high and then the cluster got > deadlocked because the maxAMShare constrain kicked in and no new AM got > admitted to the cluster. > I have described the problem in detail here: > https://github.com/apache/spark/pull/5233#issuecomment-87160289 > In summary - the condition for adding the container's memory towards > amResourceUsage is fragile. It depends on the number of live containers > belonging to the app. We saw that the spark AM went down without explicitly > releasing its requested containers and then one of those containers memory > was counted towards amResource. > cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393450#comment-14393450 ] Hadoop QA commented on YARN-2942: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709065/ConcatableAggregatedLogsProposal_v4.pdf against trunk revision 6a6a59d. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7204//console This message is automatically generated. > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393451#comment-14393451 ] Wangda Tan commented on YARN-2729: -- Apparently Jenkins ran wrong tests, rekicked Jenkins. > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup > --- > > Key: YARN-2729 > URL: https://issues.apache.org/jira/browse/YARN-2729 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Fix For: 2.8.0 > > Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch, > YARN-2729.20141031-1.patch, YARN-2729.20141120-1.patch, > YARN-2729.20141210-1.patch, YARN-2729.20150309-1.patch, > YARN-2729.20150322-1.patch, YARN-2729.20150401-1.patch, > YARN-2729.20150402-1.patch > > > Support script based NodeLabelsProvider Interface in Distributed Node Label > Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393445#comment-14393445 ] Hitesh Shah commented on YARN-2890: --- Sorry did not check the last update. Minor nit: Some of the test changes in TestMRTimelineEventHandling probably need to belong in TestMiniYarnCluster if that exists as yarn timeline flag behaviour checks should ideally be tested in yarn code and not MR code. > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, > YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, > YARN-2890.patch > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393441#comment-14393441 ] Junping Du commented on YARN-3334: -- Thanks [~zjshen] for review and comments! bq. but I undo the some unnecessary change in TimelineClientImpl (which seems to be adde for code debugging). I think that is necessary change. Previous message cannot tell too much info especially it return no different message between no response and response with failure. Also, error code should be log out even debug is not on because this is serious failure and should be reported in production environment. Thoughts? > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393440#comment-14393440 ] Vinod Kumar Vavilapalli commented on YARN-3318: --- Filed YARN-3441 and YARN-3442 for parent queues and for limits. > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3318.13.patch, YARN-3318.14.patch, > YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, > YARN-3318.36.patch, YARN-3318.39.patch > > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3442) Consider abstracting out user, app limits etc into some sort of a LimitPolicy
Vinod Kumar Vavilapalli created YARN-3442: - Summary: Consider abstracting out user, app limits etc into some sort of a LimitPolicy Key: YARN-3442 URL: https://issues.apache.org/jira/browse/YARN-3442 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Similar to the policies being added in YARN-3318 and YARN-3441 for leaf and parent queues, we should consider extracting an abstraction for limits too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2942) Aggregated Log Files should be combined
[ https://issues.apache.org/jira/browse/YARN-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2942: Attachment: ConcatableAggregatedLogsProposal_v4.pdf I've just uploaded ConcatableAggregatedLogsProposal_v4.pdf, with an updated design that uses a slightly modified version of the CombinedAggregatedLogFormat (now ConcatableAggregatedLogFormat) I already wrote and would use HDFS concat to combine the files. [~zjshen], [~kasha], and [~vinodkv], can you take a look at it? > Aggregated Log Files should be combined > --- > > Key: YARN-2942 > URL: https://issues.apache.org/jira/browse/YARN-2942 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.6.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: CombinedAggregatedLogsProposal_v3.pdf, > CompactedAggregatedLogsProposal_v1.pdf, > CompactedAggregatedLogsProposal_v2.pdf, > ConcatableAggregatedLogsProposal_v4.pdf, YARN-2942-preliminary.001.patch, > YARN-2942-preliminary.002.patch, YARN-2942.001.patch, YARN-2942.002.patch, > YARN-2942.003.patch > > > Turning on log aggregation allows users to easily store container logs in > HDFS and subsequently view them in the YARN web UIs from a central place. > Currently, there is a separate log file for each Node Manager. This can be a > problem for HDFS if you have a cluster with many nodes as you’ll slowly start > accumulating many (possibly small) files per YARN application. The current > “solution” for this problem is to configure YARN (actually the JHS) to > automatically delete these files after some amount of time. > We should improve this by compacting the per-node aggregated log files into > one log file per application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393433#comment-14393433 ] Rohini Palaniswamy commented on YARN-3439: -- bq. Essentially the idea is to reference count the tokens and only attempt to cancel them when the token is no longer referenced. Would be a good idea. I think this is the third time we have had delegation token renewal broken for Oozie with the Hadoop 2.x line. > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > Attachments: YARN-3439.001.patch > > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3441) Introduce the notion of policies for a parent queue
Vinod Kumar Vavilapalli created YARN-3441: - Summary: Introduce the notion of policies for a parent queue Key: YARN-3441 URL: https://issues.apache.org/jira/browse/YARN-3441 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Similar to the policy being added in YARN-3318 for leaf-queues, we need to extend this notion to parent-queue too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393410#comment-14393410 ] Hadoop QA commented on YARN-3439: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709044/YARN-3439.001.patch against trunk revision eccb7d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7200//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7200//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7200//console This message is automatically generated. > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > Attachments: YARN-3439.001.patch > > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393412#comment-14393412 ] Hudson commented on YARN-3415: -- FAILURE: Integrated in Hadoop-trunk-Commit #7497 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7497/]) YARN-3415. Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue (Zhihai Xu via Sandy Ryza) (sandy: rev 6a6a59db7f1bfda47c3c14fb49676a7b22d2eb06) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java * hadoop-yarn-project/CHANGES.txt > Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler > queue > -- > > Key: YARN-3415 > URL: https://issues.apache.org/jira/browse/YARN-3415 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: zhihai xu >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-3415.000.patch, YARN-3415.001.patch, > YARN-3415.002.patch > > > We encountered this problem while running a spark cluster. The > amResourceUsage for a queue became artificially high and then the cluster got > deadlocked because the maxAMShare constrain kicked in and no new AM got > admitted to the cluster. > I have described the problem in detail here: > https://github.com/apache/spark/pull/5233#issuecomment-87160289 > In summary - the condition for adding the container's memory towards > amResourceUsage is fragile. It depends on the number of live containers > belonging to the app. We saw that the spark AM went down without explicitly > releasing its requested containers and then one of those containers memory > was counted towards amResource. > cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393408#comment-14393408 ] Mit Desai commented on YARN-2890: - [~hitesh], did you had any comments on the patch? > MiniMRYarnCluster should turn on timeline service if configured to do so > > > Key: YARN-2890 > URL: https://issues.apache.org/jira/browse/YARN-2890 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Mit Desai >Assignee: Mit Desai > Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, > YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, > YARN-2890.patch > > > Currently the MiniMRYarnCluster does not consider the configuration value for > enabling timeline service before starting. The MiniYarnCluster should only > start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393388#comment-14393388 ] Vinod Kumar Vavilapalli commented on YARN-3318: --- bq. I think it should be fine to make policy interfaces define as well as CapacityScheduler changes together with this patch (only for FifoOrderingPolicy), it's good to see how interfaces and policies work in CS, is it easy or not, etc. = We can still do this with patches on two JIRAs - one for the framework, one for CS, one for FS etc. The Fifo one can be here for demonstration, no problem with that. Why is it so hard to focus one thing in one JIRA? > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3318.13.patch, YARN-3318.14.patch, > YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, > YARN-3318.36.patch, YARN-3318.39.patch > > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-3415: - Summary: Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler queue (was: Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue) > Non-AM containers can be counted towards amResourceUsage of a Fair Scheduler > queue > -- > > Key: YARN-3415 > URL: https://issues.apache.org/jira/browse/YARN-3415 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3415.000.patch, YARN-3415.001.patch, > YARN-3415.002.patch > > > We encountered this problem while running a spark cluster. The > amResourceUsage for a queue became artificially high and then the cluster got > deadlocked because the maxAMShare constrain kicked in and no new AM got > admitted to the cluster. > I have described the problem in detail here: > https://github.com/apache/spark/pull/5233#issuecomment-87160289 > In summary - the condition for adding the container's memory towards > amResourceUsage is fragile. It depends on the number of live containers > belonging to the app. We saw that the spark AM went down without explicitly > releasing its requested containers and then one of those containers memory > was counted towards amResource. > cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393374#comment-14393374 ] Wangda Tan commented on YARN-3318: -- [~cwelch], I took a look at your latest patch as well as [~vinodkv]'s suggestions, comments: *1. I prefer what Vinod suggested, split "SchedulerProcess" to be "QueueSchedulable" and "AppSchedulable" to avoid notes in FairScheduler interface "Schedulable" like:* {code} /** Start time for jobs in FIFO queues; meaningless for QueueSchedulables.*/ {code} They can both inherit {{Schedulable}}. With this patch, we can limit to AppSchedulable and Schedulable definition. Also, regarding to schedulable comparator, not all "Schedulable" fit for all comparator, it's meaningless to do "FIFO" scheduling in parent queue level. I think: {code} Schedulable contains ResourceUsage (class), and name In addition, AppSchedulable contains compareSubmissionOrderTo(AppSchedulable) and Priority {code} *2. About inherit relationships between interfaces/classes, now it's not very clear to me, I spent some time got what they're doing. My suggestion is:* {code} FairOrderingPolicy/FifoOrderingPolicy > OrderingPolicy (implements) FairOrderingPolicy and FifoOrderingPolicy could inherit from AbstractOrderingPolicy with commmon implementations FairOrderingPolicy/FifoOrderingPolicy > FairSchedulableComparator/FifoSchedulableComparator (uses) It's no need to invent "SchedulerComparator" interface, use existing Java Comparator interface should be simple and enough. {code} *3. Regarding relationship between OrderingPolicy and comparator:* I understand the method of SchedulerComparator is to reduce unnecessary re-sort Schedulables being added/modified in OrderingPolicy, but actually we can 1) Do this in OrderingPolicy itself. For example, with my above suggestion, FifoOrderingPolicy will simply ignore container changed notifications. 2) Comparator doesn't know about global info, only OrderingPolicy knows how combination of Comparator actors, I don't want containerAllocate/Release coupled in Comparator interface. And we don't need a separated "CompoundComparator", this can be put in AbstractOrderingPolicy. *4. Regarding configuration (CapacitySchedulerConfiguration):* I think we don't need ORDERING_POLICY_CLASS, two operations for very similar purpose can confuse user. I suggest only leave ordering-policy, and it name can be: "fifo", "fair" regardless of its internal "comparator" implementaiton. And in the future we can add "priority-fifo", "priority-fair". (note the "-" in name doesn't means "AND" only, it could be collaborate of the two instead of simply combination). If user specify a name not in white-list-shortname given by us, we will try to load class with the name. *5. Regarding longer term plan, LimitPolicy:* This part seems not well discussed, to limit scope of this JIRA, so I think its implementation and definition should happen in separated ticket. For longer plan, considering YARN-2986 as well, we may configure queue like following: {code} fair true 50 .. .. {code} Changes of this patch in CapacitySchedulerConfiguration seems reasonable, as Craig mentioned, simply mark it to be unstable or experimental should be enough. Longer term is to define and stablize YARN-2986 to make a real uniformed scheduler. *6. Regarding scope of this JIRA* I think it should be fine to make policy interfaces define as well as CapacityScheduler changes together with this patch (only for FifoOrderingPolicy), it's good to see how interfaces and policies work in CS, is it easy or not, etc. = And following I suggest to move to a separated ticket: 1) UI (Web and CLI) 2) REST 3) PB related changes Along with patch getting changes, you don't have to maintain above changes together with the patch. Please feel free to let me know your thoughts. > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3318.13.patch, YARN-3318.14.patch, > YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, > YARN-3318.36.patch, YARN-3318.39.patch > >
[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3334: -- Attachment: YARN-3334.7.patch Last patch looks good to me, but I undo the some unnecessary change in TimelineClinetImpl (which seems to be adde for code debugging). Will hold the patch for a while before committing, in case other folks want to to take a look. > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334.7.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework, integrate with CapacityScheduler LeafQueue supporting present behavior
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393347#comment-14393347 ] Vinod Kumar Vavilapalli commented on YARN-3318: --- bq. I think it is useful to split off CS changes into their own JIRA. We can strictly focus on the policy framework here. You missed this, let's please do this. bq. well, I'd actually talked Wangda Tan into SchedulerProcess So, we can chew on this a bit more & see where we go SchedulerProcess is definitely misleading. It seems to point to a process that is doing scheduling. What you need is a Schedulable / SchedulableEntity / Consumer etc. You could also say SchedulableProcess, but Process is way too overloaded. bq. The goal is to make this available quickly but iteratively, keeping the changes small but making them available for use and feedback. (..) We should grow it organically, gradually, iteratively, think of it is a facet of the policy framework hooked up and available but with more to follow I agree to this, but we are not in a position to support the APIs, CLI, config names in a supportable manner yet. They may or may not change depending on how parent queue policies, limit policies evolve. For that reason alone, I am saying that (1) Don't make the configurations public yet, or put a warning saying that they are unstable and (2) don't expose them in CLI , REST APIs yet. It's okay to put in the web UI, web UI scraping is not a contract. bq. You add/remove applications to/from LeafQueue's policy but addition/removal of containers is an event... bq. This has been factored differently along Wangda Tan's suggestion, it should now be consistent It's a bit better now. Although we are hard-coding Containers. Can revisit this later. Other comments - SchedulerApplicationAttempt.getDemand() should be private. - SchedulerProcess -- updateCaches() -> updateState() / updateSchedulingState() as that is what it is doing? -- getCachedConsumption() / getCachedDemand(): simply getCurrent*() ? - SchedulerComparator -- We aren't comparing Schedulers. Given the current name, it should have been SchedulerProcessComparator, but SchedulerProcess itself should be renamed as mentioned before. -- What is the need for reorderOnContainerAllocate () / reorderOnContainerRelease()? - Move all the comparator related classed into their own package. - SchedulerComparatorPolicy -- This is really a ComparatorBasedOrderingPolicy. Do we really see non-comparator based ordering-policy. We are unnecessarily adding two abstractions - adding policies and comparators. -- Use className.getName() instead of hardcoded strings like "org.apache.hadoop.yarn.server.resourcemanager.scheduler.policy.FifoComparator" > Create Initial OrderingPolicy Framework, integrate with CapacityScheduler > LeafQueue supporting present behavior > --- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3318.13.patch, YARN-3318.14.patch, > YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, > YARN-3318.36.patch, YARN-3318.39.patch > > > Create the initial framework required for using OrderingPolicies with > SchedulerApplicaitonAttempts and integrate with the CapacityScheduler. This > will include an implementation which is compatible with current FIFO behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3415) Non-AM containers can be counted towards amResourceUsage of a fairscheduler queue
[ https://issues.apache.org/jira/browse/YARN-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393351#comment-14393351 ] zhihai xu commented on YARN-3415: - [~sandyr], thanks for the review, The latest patch YARN-3415.002.patch is rebased on the latest code base and it passed the Jenkins test. Let me know whether you have more comments for the patch. > Non-AM containers can be counted towards amResourceUsage of a fairscheduler > queue > - > > Key: YARN-3415 > URL: https://issues.apache.org/jira/browse/YARN-3415 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.6.0 >Reporter: Rohit Agarwal >Assignee: zhihai xu >Priority: Critical > Attachments: YARN-3415.000.patch, YARN-3415.001.patch, > YARN-3415.002.patch > > > We encountered this problem while running a spark cluster. The > amResourceUsage for a queue became artificially high and then the cluster got > deadlocked because the maxAMShare constrain kicked in and no new AM got > admitted to the cluster. > I have described the problem in detail here: > https://github.com/apache/spark/pull/5233#issuecomment-87160289 > In summary - the condition for adding the container's memory towards > amResourceUsage is fragile. It depends on the number of live containers > belonging to the app. We saw that the spark AM went down without explicitly > releasing its requested containers and then one of those containers memory > was counted towards amResource. > cc - [~sandyr] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3388) Allocation in LeafQueue could get stuck because DRF calculator isn't well supported when computing user-limit
[ https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated YARN-3388: - Attachment: YARN-3388-v1.patch Hi [~leftnoteasy]. Uploaded a new version of patch that addresses the inefficiency and adds unit tests. I think label support is better left for a separate jira when labels are fully working with userlimits. > Allocation in LeafQueue could get stuck because DRF calculator isn't well > supported when computing user-limit > - > > Key: YARN-3388 > URL: https://issues.apache.org/jira/browse/YARN-3388 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts > Attachments: YARN-3388-v0.patch, YARN-3388-v1.patch > > > When there are multiple active users in a queue, it should be possible for > those users to make use of capacity up-to max_capacity (or close). The > resources should be fairly distributed among the active users in the queue. > This works pretty well when there is a single resource being scheduled. > However, when there are multiple resources the situation gets more complex > and the current algorithm tends to get stuck at Capacity. > Example illustrated in subsequent comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393264#comment-14393264 ] Zhijie Shen commented on YARN-3391: --- [~vrushalic], it sounds good to me to set aside the disagreement on the flow name default and move on. As far as I can tell, with the current context info data flow, it's quite simple to change the default value if we figure out the better one later. In addition, the previous debate is also related how we show flows on the web UI by default. I think we can go back to visit the defaults once we reaches the web UI work when we should have a better idea about it. > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3440) ResourceUsage should be copy-on-write
[ https://issues.apache.org/jira/browse/YARN-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3440: Description: In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, even if it is thread-safe, but Resource returned by getters could be updated by another thread. All Resource objects in ResourceUsage should be copy-on-write, reader will always get a non-changed Resource. And changes apply on Resource acquired by caller will not affect original Resource. was: In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage }}, even if it is thread-safe, but Resource returned by getters could be updated by another thread. All Resource objects in ResourceUsage should be copy-on-write, reader will always get a non-changed Resource. And changes apply on Resource acquired by caller will not affect original Resource. > ResourceUsage should be copy-on-write > - > > Key: YARN-3440 > URL: https://issues.apache.org/jira/browse/YARN-3440 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler, yarn >Reporter: Wangda Tan >Assignee: Li Lu > > In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage}}, > even if it is thread-safe, but Resource returned by getters could be updated > by another thread. > All Resource objects in ResourceUsage should be copy-on-write, reader will > always get a non-changed Resource. And changes apply on Resource acquired by > caller will not affect original Resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-3439: - Attachment: YARN-3439.001.patch Daryn is out so posting a prototype patch he developed to get some early feedback. Note that this patch can't go in as-is, as it's a work-in-progress that hacks out the automatic HDFS delegation token logic that was added as part of YARN-2704. Essentially the idea is to reference count the tokens and only attempt to cancel them when the token is no longer referenced. Since the launcher job won't complete until it has successfully submitted the sub-job(s), the token will remain referenced throughout the lifespan of the workflow even if the launcher job exits early. > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > Attachments: YARN-3439.001.patch > > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393194#comment-14393194 ] Craig Welch commented on YARN-3293: --- Hey [~vvasudev], it seems that the patch doesn't apply cleanly, can you update to latest trunk? > Track and display capacity scheduler health metrics in web UI > - > > Key: YARN-3293 > URL: https://issues.apache.org/jira/browse/YARN-3293 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, > apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch > > > It would be good to display metrics that let users know about the health of > the capacity scheduler in the web UI. Today it is hard to get an idea if the > capacity scheduler is functioning correctly. Metrics such as the time for the > last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3334: - Attachment: YARN-3334-v6.patch Incorporate [~zjshen]'s comments in v6 patch. Rebase it to latest YARN-2928 and verified e2e test can pass. [~zjshen], can you look it again? Thanks! > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, > YARN-3334-v5.patch, YARN-3334-v6.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3440) ResourceUsage should be copy-on-write
Wangda Tan created YARN-3440: Summary: ResourceUsage should be copy-on-write Key: YARN-3440 URL: https://issues.apache.org/jira/browse/YARN-3440 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler, yarn Reporter: Wangda Tan In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage }}, even if it is thread-safe, but Resource returned by getters could be updated by another thread. All Resource objects in ResourceUsage should be copy-on-write, reader will always get a non-changed Resource. And changes apply on Resource acquired by caller will not affect original Resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393150#comment-14393150 ] Vrushali C commented on YARN-3391: -- Hi [~zjshen] In the interest of time, I think let's park these disagreements aside and move forward with your defaults. If the need arises, we could revisit defaults in the future. What do you all think? cc [~sjlee0] thanks Vrushali > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning stats to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393148#comment-14393148 ] Hadoop QA commented on YARN-2901: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12709021/apache-yarn-2901.5.patch against trunk revision 9ed43f2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7199//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7199//console This message is automatically generated. > Add errors and warning stats to RM, NM web UI > - > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, > apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, > apache-yarn-2901.4.patch, apache-yarn-2901.5.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3374) Collector's web server should randomly bind an available port
[ https://issues.apache.org/jira/browse/YARN-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved YARN-3374. -- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed Commit it to branch YARN-2928. Thanks [~zjshen] for the patch! > Collector's web server should randomly bind an available port > - > > Key: YARN-3374 > URL: https://issues.apache.org/jira/browse/YARN-3374 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Fix For: YARN-2928 > > Attachments: YARN-3347.1.patch > > > It's based on the configuration now. The approach won't work if we move to > app-level aggregator container solution. On NM my start multiple such > aggregators, which cannot bind to the same configured port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3440) ResourceUsage should be copy-on-write
[ https://issues.apache.org/jira/browse/YARN-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned YARN-3440: --- Assignee: Li Lu > ResourceUsage should be copy-on-write > - > > Key: YARN-3440 > URL: https://issues.apache.org/jira/browse/YARN-3440 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler, yarn >Reporter: Wangda Tan >Assignee: Li Lu > > In {{org.apache.hadoop.yarn.server.resourcemanager.scheduler.ResourceUsage > }}, even if it is thread-safe, but Resource returned by getters could be > updated by another thread. > All Resource objects in ResourceUsage should be copy-on-write, reader will > always get a non-changed Resource. And changes apply on Resource acquired by > caller will not affect original Resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393139#comment-14393139 ] Zhijie Shen commented on YARN-3390: --- I think it makes sense to generalize TimelineEntityContext from a single app's context to the app -> context map. Reader may need this map too. I'll fix the problem after YARN-3391 is done. > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393116#comment-14393116 ] Hudson commented on YARN-3430: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2101 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2101/]) YARN-3430. Made headroom data available on app attempt page of RM WebUI. Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java * hadoop-yarn-project/CHANGES.txt > RMAppAttempt headroom data is missing in RM Web UI > -- > > Key: YARN-3430 > URL: https://issues.apache.org/jira/browse/YARN-3430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3430.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed
[ https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393122#comment-14393122 ] Hudson commented on YARN-3425: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2101 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2101/]) YARN-3425. NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: rev 492239424a3ace9868b6154f44a0f18fa5318235) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt > NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit > failed > -- > > Key: YARN-3425 > URL: https://issues.apache.org/jira/browse/YARN-3425 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: 1 RM, 1 NM , 1 NN , I DN >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3425.001.patch > > > Configure yarn.node-labels.enabled to true > and yarn.node-labels.fs-store.root-dir /node-labels > Start resource manager without starting DN/NM > {quote} > 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When > stopping the service > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207) > {quote} > {code} > protected void stopDispatcher() { > AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher; >asyncDispatcher.stop(); > } > {code} > Null check missing during stop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393105#comment-14393105 ] Jason Lowe commented on YARN-3439: -- This appears to be caused by YARN-2704. YARN-2964 tried to fix it but assumed that Oozie launcher jobs will always run for the duration of the sub-jobs. This is true when the launcher runs a Pig job, but apparently is not true when it runs a standard MapReduce job. > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
Jason Lowe created YARN-3439: Summary: RM fails to renew token when Oozie launcher leaves before sub-job finishes Key: YARN-3439 URL: https://issues.apache.org/jira/browse/YARN-3439 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.0 Reporter: Jason Lowe Assignee: Daryn Sharp Priority: Blocker When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't linger waiting for the sub-job to finish. At that point the RM stops renewing delegation tokens for the launcher job which wreaks havoc on the sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) RMTimelineCollector should have the context info of each app
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393095#comment-14393095 ] Naganarasimha G R commented on YARN-3390: - Hi [~zjshen], [~djp] & [~sjlee0] {{TimelineCollector.getTimelineEntityContext()}} interface will not be suitable for the RMTimelineCollector as it will be posting/putting entities for multiple apps, appattempts and containers. Hence was initially planning to modify this method to take a {{TimelineEntity.Identifier}} as a parameter and @ RMTimelineCollector planning to hold a map of {{TimelineEntity.Identifier to AppId}} and another Map to hold {{AppId to TimelineEntityContext}} (required as context is created per app when appCreatedEvent occurs). But one more conflict which i can see is AppLevelTimelineCollector is specific for a app, so invoking {{getTimelineEntityContext}} in {{getTimelineEntityContext(TimelineEntities ,Ugi)}} is fine because all the entities which are posted can be assumed to have same context as they belong to a single app but in a general case (like RMTimelineCollector) its not guaranteed that all TimelineEntities belong to same app(i.e. TimelineEntities might have diff context). so would it be better to change the interface of {{TimelineCollector.putEntities)}} to accept the {{TimelineEntityContext}} as parameter and remove {{TimelineCollector.getTimelineEntityContext()}} method from interface ? please share your opinion... > RMTimelineCollector should have the context info of each app > > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393089#comment-14393089 ] Junping Du commented on YARN-3046: -- Also, looks like no YARN changes get involved in this JIRA, will migrate it to MAPREDUCE project later. > [Event producers] Implement MapReduce AM writing some MR metrics to ATS > --- > > Key: YARN-3046 > URL: https://issues.apache.org/jira/browse/YARN-3046 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: YARN-3046-no-test.patch > > > Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes > written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3046: - Attachment: YARN-3046-no-test.patch Upload the first patch - not including any test yet. Call for a early review. Will add tests soon. > [Event producers] Implement MapReduce AM writing some MR metrics to ATS > --- > > Key: YARN-3046 > URL: https://issues.apache.org/jira/browse/YARN-3046 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: YARN-3046-no-test.patch > > > Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes > written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2901) Add errors and warning stats to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2901: Attachment: apache-yarn-2901.5.patch {quote} I realized if we set clean-up-threshold > maxUniqueMessages, user can see it, how about doing clean-up in two conditions: 1) User get message, and #message > maxUniqueMessages 2) #messages > message-threshold, we can set the message-threshold to higher to avoid too frequent cleanup. Sounds good? {quote} Makes sense; made the change. bq. I just tried to move that, it seems no more issues happen, could you check that? Moved ErrorAndWarningsBlock to hadoop-yarn-server-common. Renamed ErrorsAndWarningsPage in RM and NM to RMErrorsAndWarningsPage and NMErrorsAndWarningsPage. > Add errors and warning stats to RM, NM web UI > - > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, > apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, > apache-yarn-2901.4.patch, apache-yarn-2901.5.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3334) [Event Producers] NM TimelineClient life cycle handling and container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393015#comment-14393015 ] Zhijie Shen commented on YARN-3334: --- Junping, did you have the chance to look at the 3 and 4 of my last patch comment? One more nit: newTimelineServiceEnabled(config) -> systemMetricsPublisherEnabled? > [Event Producers] NM TimelineClient life cycle handling and container metrics > posting to new timeline service. > -- > > Key: YARN-3334 > URL: https://issues.apache.org/jira/browse/YARN-3334 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: YARN-2928 >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, > YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, YARN-3334-v5.patch > > > After YARN-3039, we have service discovery mechanism to pass app-collector > service address among collectors, NMs and RM. In this JIRA, we will handle > service address setting for TimelineClients in NodeManager, and put container > metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3438) add a mode to replay MR job history files to the timeline service
Sangjin Lee created YARN-3438: - Summary: add a mode to replay MR job history files to the timeline service Key: YARN-3438 URL: https://issues.apache.org/jira/browse/YARN-3438 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee The subtask covers the work on top of YARN-3437 to add a mode to replay MR job history files to the timeline service storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3437) convert load test driver to timeline service v.2
Sangjin Lee created YARN-3437: - Summary: convert load test driver to timeline service v.2 Key: YARN-3437 URL: https://issues.apache.org/jira/browse/YARN-3437 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee This subtask covers the work for converting the proposed patch for the load test driver (YARN-2556) to work with the timeline service v.2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3378) a load test client that can replay a volume of history files
[ https://issues.apache.org/jira/browse/YARN-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-3378: -- Issue Type: New Feature (was: Sub-task) Parent: (was: YARN-2928) > a load test client that can replay a volume of history files > > > Key: YARN-3378 > URL: https://issues.apache.org/jira/browse/YARN-3378 > Project: Hadoop YARN > Issue Type: New Feature > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Sangjin Lee > > It might be good to create a load test client that can replay a large volume > of history files into the timeline service. One can envision running such a > load test client as a mapreduce job and generate a fair amount of load. It > would be useful to spot check correctness, and more importantly observe > performance characteristic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3436: --- Attachment: YARN-3436.001.patch Attaching patch for the same. please check the same > Doc WebServicesIntro.html Example Rest API url wrong > > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3436) Doc WebServicesIntro.html Example Rest API url wrong
Bibin A Chundatt created YARN-3436: -- Summary: Doc WebServicesIntro.html Example Rest API url wrong Key: YARN-3436 URL: https://issues.apache.org/jira/browse/YARN-3436 Project: Hadoop YARN Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html {quote} Response Examples JSON response with single resource HTTP Request: GET http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 Response Status Line: HTTP/1.1 200 OK {quote} Url should be ws/v1/cluster/{color:red}apps{color} . 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null
[ https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3435: --- Attachment: Screenshot.png Attaching Screen shot for bug > AM container to be allocated Appattempt AM container shown as null > -- > > Key: YARN-3435 > URL: https://issues.apache.org/jira/browse/YARN-3435 > Project: Hadoop YARN > Issue Type: Bug > Environment: 1RM,1DN >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Attachments: Screenshot.png > > > Submit yarn application > Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 > Before the AM container is allocated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3435) AM container to be allocated Appattempt AM container shown as null
[ https://issues.apache.org/jira/browse/YARN-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3435: --- Attachment: YARN-3435.001.patch > AM container to be allocated Appattempt AM container shown as null > -- > > Key: YARN-3435 > URL: https://issues.apache.org/jira/browse/YARN-3435 > Project: Hadoop YARN > Issue Type: Bug > Environment: 1RM,1DN >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Trivial > Attachments: Screenshot.png, YARN-3435.001.patch > > > Submit yarn application > Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 > Before the AM container is allocated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3435) AM container to be allocated Appattempt AM container shown as null
Bibin A Chundatt created YARN-3435: -- Summary: AM container to be allocated Appattempt AM container shown as null Key: YARN-3435 URL: https://issues.apache.org/jira/browse/YARN-3435 Project: Hadoop YARN Issue Type: Bug Environment: 1RM,1DN Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Trivial Submit yarn application Open http://:8088/cluster/appattempt/appattempt_1427984982805_0003_01 Before the AM container is allocated -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3425) NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed
[ https://issues.apache.org/jira/browse/YARN-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392872#comment-14392872 ] Hudson commented on YARN-3425: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/151/]) YARN-3425. NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit failed. (Bibin A Chundatt via wangda) (wangda: rev 492239424a3ace9868b6154f44a0f18fa5318235) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/CHANGES.txt > NPE from RMNodeLabelsManager.serviceStop when NodeLabelsManager.serviceInit > failed > -- > > Key: YARN-3425 > URL: https://issues.apache.org/jira/browse/YARN-3425 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Environment: 1 RM, 1 NM , 1 NN , I DN >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3425.001.patch > > > Configure yarn.node-labels.enabled to true > and yarn.node-labels.fs-store.root-dir /node-labels > Start resource manager without starting DN/NM > {quote} > 2015-03-31 16:44:13,782 WARN org.apache.hadoop.service.AbstractService: When > stopping the service > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager : > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:261) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:267) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:984) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:251) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1207) > {quote} > {code} > protected void stopDispatcher() { > AsyncDispatcher asyncDispatcher = (AsyncDispatcher) dispatcher; >asyncDispatcher.stop(); > } > {code} > Null check missing during stop -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392866#comment-14392866 ] Hudson commented on YARN-3430: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #151 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/151/]) YARN-3430. Made headroom data available on app attempt page of RM WebUI. Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java > RMAppAttempt headroom data is missing in RM Web UI > -- > > Key: YARN-3430 > URL: https://issues.apache.org/jira/browse/YARN-3430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3430.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392784#comment-14392784 ] Hudson commented on YARN-3430: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #142 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/142/]) YARN-3430. Made headroom data available on app attempt page of RM WebUI. Contributed by Xuan Gong. (zjshen: rev 8366a36ad356e6318b8ce6c5c96e201149f811bd) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMAppAttemptBlock.java > RMAppAttempt headroom data is missing in RM Web UI > -- > > Key: YARN-3430 > URL: https://issues.apache.org/jira/browse/YARN-3430 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager, webapp, yarn >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Blocker > Fix For: 2.7.0 > > Attachments: YARN-3430.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)