[jira] [Commented] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called
[ https://issues.apache.org/jira/browse/YARN-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484845#comment-14484845 ] Tsuyoshi Ozawa commented on YARN-3457: -- +1, committing this shortly. > NPE when NodeManager.serviceInit fails and stopRecoveryStore called > --- > > Key: YARN-3457 > URL: https://issues.apache.org/jira/browse/YARN-3457 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-3457.001.patch > > > When NodeManager service init fails during stopRecoveryStore null pointer > exception is thrown > {code} > @Override > protected void serviceInit(Configuration conf) throws Exception { >.. > try { > exec.init(); > } catch (IOException e) { > throw new YarnRuntimeException("Failed to initialize container > executor", e); > } > this.context = createNMContext(containerTokenSecretManager, > nmTokenSecretManager, nmStore); > > {code} > context is null when service init fails > {code} > private void stopRecoveryStore() throws IOException { > nmStore.stop(); > if (context.getDecommissioned() && nmStore.canRecover()) { >.. > } > } > {code} > Null pointer exception thrown > {quote} > 015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When > stopping the service NodeManager : java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484815#comment-14484815 ] Zhijie Shen commented on YARN-3391: --- I created a new patch: bq. So in general, I think we should use as much javadoc comments instead of inline comments for public APIs. Move the comments into TimelineUtils and make them javadoc. bq. We should add more info to LOG.warn messages, at least to tell user flow run should be numeric. Improve the warn message bq. In addition, do we need to check negative value for flow run here? According to Sangjin's given example, we usually want to identify a flow run by timestamp, which theoretically can be negative to represent sometime before 1970. > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch, > YARN-3391.4.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called
[ https://issues.apache.org/jira/browse/YARN-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484813#comment-14484813 ] Hadoop QA commented on YARN-3457: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723815/YARN-3457.001.patch against trunk revision ab04ff9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7252//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7252//console This message is automatically generated. > NPE when NodeManager.serviceInit fails and stopRecoveryStore called > --- > > Key: YARN-3457 > URL: https://issues.apache.org/jira/browse/YARN-3457 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-3457.001.patch > > > When NodeManager service init fails during stopRecoveryStore null pointer > exception is thrown > {code} > @Override > protected void serviceInit(Configuration conf) throws Exception { >.. > try { > exec.init(); > } catch (IOException e) { > throw new YarnRuntimeException("Failed to initialize container > executor", e); > } > this.context = createNMContext(containerTokenSecretManager, > nmTokenSecretManager, nmStore); > > {code} > context is null when service init fails > {code} > private void stopRecoveryStore() throws IOException { > nmStore.stop(); > if (context.getDecommissioned() && nmStore.canRecover()) { >.. > } > } > {code} > Null pointer exception thrown > {quote} > 015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When > stopping the service NodeManager : java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3391: -- Attachment: YARN-3391.4.patch > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch, > YARN-3391.4.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484798#comment-14484798 ] Hadoop QA commented on YARN-3459: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723835/apache-yarn-3459.0.patch against trunk revision ab04ff9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7251//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7251//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7251//console This message is automatically generated. > TestLog4jWarningErrorMetricsAppender breaks in trunk > > > Key: YARN-3459 > URL: https://issues.apache.org/jira/browse/YARN-3459 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Fix For: 2.7.0 > > Attachments: apache-yarn-3459.0.patch > > > TestLog4jWarningErrorMetricsAppender fails with the following message: > {code} > Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< > FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) > Time elapsed: 2.01 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3459: Attachment: apache-yarn-3459.0.patch My apologies for the failing test. I suspect it's a timing issue since it passed the pre-commit builds and is passing on my machine. Can you try the attached patch and +1 if it works? > TestLog4jWarningErrorMetricsAppender breaks in trunk > > > Key: YARN-3459 > URL: https://issues.apache.org/jira/browse/YARN-3459 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Fix For: 2.7.0 > > Attachments: apache-yarn-3459.0.patch > > > TestLog4jWarningErrorMetricsAppender fails with the following message: > {code} > Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< > FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) > Time elapsed: 2.01 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484738#comment-14484738 ] Hadoop QA commented on YARN-3326: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723819/YARN-3326.20150408-1.patch against trunk revision 4be648b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7250//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7250//console This message is automatically generated. > ReST support for getLabelsToNodes > -- > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484717#comment-14484717 ] Jonathan Eagles commented on YARN-3448: --- [~zjshen], Interesting idea about index just beings pointers into the entity db. I'll have to investigate what the write and read performance implications are. As for rolling period vs ttl. I think rolling period should always be a smaller than ttl. One thing to consider is that unlike traditional rolling files, there are more than one active at a time. In fact, all rolling dbs from now unto ttl may be active. That is due to stitching of data back together on the reads. All events for the same entity id will go into the same database. My current setup includes rolling every hour and a ttl of one day. As far as what roll does, it only schedules the db to be deleted and removes the old entity and index from being found. This does mean that there will some start times associated that are old that are still active. That will get eventually consistent once the ttl eviction period finishes. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.3.patch > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484668#comment-14484668 ] Naganarasimha G R commented on YARN-3462: - thanks for reviewing [~sidharta-s] , yes i have compiled the patch in Branch2 and it was compiling fine. > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > Attachments: YARN-3462.20150508-1.patch > > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3326: Attachment: YARN-3326.20150408-1.patch Thanks for reviewing [~ozawa] , have updated the patch with your review comment > ReST support for getLabelsToNodes > -- > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called
[ https://issues.apache.org/jira/browse/YARN-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-3457: --- Attachment: YARN-3457.001.patch > NPE when NodeManager.serviceInit fails and stopRecoveryStore called > --- > > Key: YARN-3457 > URL: https://issues.apache.org/jira/browse/YARN-3457 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Attachments: YARN-3457.001.patch > > > When NodeManager service init fails during stopRecoveryStore null pointer > exception is thrown > {code} > @Override > protected void serviceInit(Configuration conf) throws Exception { >.. > try { > exec.init(); > } catch (IOException e) { > throw new YarnRuntimeException("Failed to initialize container > executor", e); > } > this.context = createNMContext(containerTokenSecretManager, > nmTokenSecretManager, nmStore); > > {code} > context is null when service init fails > {code} > private void stopRecoveryStore() throws IOException { > nmStore.stop(); > if (context.getDecommissioned() && nmStore.canRecover()) { >.. > } > } > {code} > Null pointer exception thrown > {quote} > 015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When > stopping the service NodeManager : java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484608#comment-14484608 ] Tsuyoshi Ozawa commented on YARN-3326: -- [~Naganarasimha] thank you for updating. LGTM overall. Minor nits: let's avoid using * import. {code} +import java.util.*; {code} > ReST support for getLabelsToNodes > -- > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484548#comment-14484548 ] Jian He commented on YARN-3348: --- Thanks Varun, some comments: - “Unable to fetach cluster metrics” - typo - exceeding 80 Column limit, {code} opts .addOption( "types", true, "Comma separated list of types to restrict applications, case sensitive(though the display is lower case)"); {code} - the -rows, -cols options seems not having effect on my screen when I tried it, could you double check ? - the ‘yarn top’ output is repeatedly showing up on terminal every $delay seconds. it’ll be better to only show that only once. - Does the patch only show root queue info ? should we show all queues info ? - “F + Enter : Select sort field” ; may be use ’S’ for sorting ? - “Memory seconds(in GBseconds” - missing “)” - It seems a bit odd to have this method in a public API record. Do you know why hashcode is not correct without this method ? Or we can just type cast it to GetApplicationsRequestPBImpl and use the method from there. {code} // need this otherwise the hashcode doesn't get generated correctly request.initAllFields(); {code} - for the caching in ClientRMService. Do you think we can do the cache on client side ? that’ll save RPCs, especially if we have many top commands running on client side. > Add a 'yarn top' tool to help understand cluster usage > -- > > Key: YARN-3348 > URL: https://issues.apache.org/jira/browse/YARN-3348 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch > > > It would be helpful to have a 'yarn top' tool that would allow administrators > to understand which apps are consuming resources. > Ideally the tool would allow you to filter by queue, user, maybe labels, etc > and show you statistics on container allocation across the cluster to find > out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484514#comment-14484514 ] Junping Du commented on YARN-3391: -- Sorry that the 2nd comment above has format issue and may hard to read. Fix the comments as below: In ClientRMService.java, {code} +// Sanity check for flow run +try { + for (String tag : submissionContext.getApplicationTags()) { +if (tag.startsWith(TimelineUtils.FLOW_RUN_TAG_PREFIX + ":") || +tag.startsWith( +TimelineUtils.FLOW_RUN_TAG_PREFIX.toLowerCase() + ":")) { + String value = + tag.substring(TimelineUtils.FLOW_RUN_TAG_PREFIX.length() + 1); + Long.valueOf(value); +} + } +} catch (NumberFormatException e) { + LOG.warn("Invalid to flow run.", e); + RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, + e.getMessage(), "ClientRMService", + "Exception in submitting application", applicationId); + throw RPCUtil.getRemoteException(e); +} {code} We should add more info to LOG.warn messages, at least to tell user flow run should be numeric. In addition, do we need to check negative value for flow run here? If not, why we are accepting negative long value but rejecting other characters than number? > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484512#comment-14484512 ] Junping Du commented on YARN-3391: -- bq. I make use of Sangjin's previous comments to add some inline code comments about their definitions in TimelineCollectorContext. I would expect the definition can show up in Javadoc of related methods in TimelineCollectorContext. This sounds like a little nitpick, but the key differences between inline comments and javadoc is if developer only use jar instead of source code, they can still read these key definitions and use it correctly (by IDE hint or generated Javadoc). So in general, I think we should use as much javadoc comments instead of inline comments for public APIs. {code} +// Sanity check for flow run +try { + for (String tag : submissionContext.getApplicationTags()) { +if (tag.startsWith(TimelineUtils.FLOW_RUN_TAG_PREFIX + ":") || +tag.startsWith( +TimelineUtils.FLOW_RUN_TAG_PREFIX.toLowerCase() + ":")) { + String value = + tag.substring(TimelineUtils.FLOW_RUN_TAG_PREFIX.length() + 1); + Long.valueOf(value); +} + } +} catch (NumberFormatException e) { + LOG.warn("Invalid to flow run.", e); + RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST, + e.getMessage(), "ClientRMService", + "Exception in submitting application", applicationId); + throw RPCUtil.getRemoteException(e); +} {cide} We should add more info to LOG.warn messages, at least to tell user flow run should be numeric. In addition, do we need to check negative value for flow run here? If not, why we are accepting negative long value but rejecting other characters than number? > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484511#comment-14484511 ] Li Lu commented on YARN-3426: - Thanks [~vinodkv] for the review! For the second point, I traced into the code of our ExcludePrivateAnnotationsJDiffDoclet, and found this may actually be a bug for RootDocProcessor. Specifically, we are instrumenting each methods(true) call to a Doc entry, but we're not instrumenting methods() calls. methods() calls have exactly the same meaning as method(true) according to http://docs.oracle.com/javase/7/docs/jdk/api/javadoc/doclet/com/sun/javadoc/ClassDoc.html . I'll post a patch to fix this soon. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, > YARN-3426-040715.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2349) InvalidStateTransitonException after RM switch
[ https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith resolved YARN-2349. -- Resolution: Cannot Reproduce Closing the issue as 'cant reproduce'. Feel free to reopen if you find issue in latest release or trunk. > InvalidStateTransitonException after RM switch > -- > > Key: YARN-2349 > URL: https://issues.apache.org/jira/browse/YARN-2349 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Nishan Shetty >Assignee: Rohith > > {code} > 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 45018: starting > 2014-07-23 19:22:28,266 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle > this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > APP_REJECTED at ACCEPTED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:662) > 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped > SelectChannelConnector@10.18.40.84:45020 > 2014-07-23 19:22:28,291 ERROR > org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: > Error when openning history file of application > application_1406116264351_0007 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484437#comment-14484437 ] Hadoop QA commented on YARN-3326: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723771/YARN-3326.20150407-1.patch against trunk revision bd77a7c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7248//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7248//console This message is automatically generated. > ReST support for getLabelsToNodes > -- > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484436#comment-14484436 ] Sidharta Seethana commented on YARN-3462: - [~Naganarasimha], Thanks for the patch. I am assuming the patch application failure is because it got applied to trunk. The patch looks good to me. > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > Attachments: YARN-3462.20150508-1.patch > > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults
[ https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484435#comment-14484435 ] Junping Du commented on YARN-3461: -- I would agree most of comments in YARN-3391. Like I proposed in that jira, can we have a configurable policy to group applications into flow by default if user doesn't specify flow name for application? For example, assume we have 3 policies can be configured (no matter what is default policy): 1. group applications into flow by application name; 2. group each applications into isolated flows; 3. group each applications into a single default flow (more like test purpose). Developers/users in future can choose/extend these policies to meet their scenarios more closely. Thoughts? > Consolidate flow name/version/run defaults > -- > > Key: YARN-3461 > URL: https://issues.apache.org/jira/browse/YARN-3461 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > > In YARN-3391, it's not resolved what should be the defaults for flow > name/version/run. Let's continue the discussion here and unblock YARN-3391 > from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484411#comment-14484411 ] Hadoop QA commented on YARN-3462: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723784/YARN-3462.20150508-1.patch against trunk revision 5b8a3ae. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7249//console This message is automatically generated. > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > Attachments: YARN-3462.20150508-1.patch > > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484407#comment-14484407 ] Naganarasimha G R commented on YARN-3462: - [~sidharta-s] , can you take a look at the patch ? > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > Attachments: YARN-3462.20150508-1.patch > > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3462: Attachment: YARN-3462.20150508-1.patch Attaching a patch with corrections in branch2 > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > Attachments: YARN-3462.20150508-1.patch > > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reopened YARN-2980: -- The target version is on 2.x, why we commit to trunk only? This doesn't sounds like incompatibility change. Reopen it until we commit it to branch-2 together with YARN-3375. > Move health check script related functionality to hadoop-common > --- > > Key: YARN-2980 > URL: https://issues.apache.org/jira/browse/YARN-2980 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Varun Saxena > Fix For: 3.0.0 > > Attachments: YARN-2980.001.patch, YARN-2980.002.patch, > YARN-2980.003.patch, YARN-2980.004.patch > > > HDFS might want to leverage health check functionality available in YARN in > both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode > https://issues.apache.org/jira/browse/HDFS-7441. > We can move health check functionality including the protocol between hadoop > daemons and health check script to hadoop-common. That will simplify the > development and maintenance for both hadoop source code and health check > script. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484363#comment-14484363 ] Zhijie Shen edited comment on YARN-3448 at 4/7/15 11:56 PM: Jonathan, I've several high questions about the design and the implementation: bq. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. According to the official [document|https://github.com/google/leveldb], LevelDb a single process (possibly multi-threaded). Therefore, instead of 5 separate (logic) tables, 5 separate databases is used to increase concurrency, isn't it? However, this approach may raise the inconsistency issue. For example, if I upload an entity with primary filter defined, I may run into a scenario that some I/O exception happens when timeline server tries to write into entity db, while the index record is persisted without any problem. In scenario, the entity is searchable by primary filter, but cannot be got by its identifier. bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. If I understand it correct, ownerdb can be treated as the secondary index of domaindb. If we want to lookup for the domains of one owner, we have two steps: 1) get all domain IDs from ownerdb and then 2) pull each individual domains from domaindb. I think we could adopt the similar approach for entitydb and indexdb. Instead of a full copy of entity content in indexdb, we could just record the entity identifier there, and do two-step lookup to answer the query. By doing this, we should be able to significantly shrink indexdb size, and improve write performance. In contrast, the previous leveldb index implementation seems to optimize towards the query. 3. I'm wondering if we need a separate configuration of rolling period or we should use ttl as the rolling period. The reason is if we set ttl smaller than the rolling period, in the most recent database, there will still exist old data. Therefore, we still need the deletion thread to remove these entities/index entries, or the query has to exclude them from result set. On the other side, it may be also not good to set ttl greater than rolling period. This is because if period now is smaller than ttl, we still need to wait until ttl to delete the database. Therefore, setting small rolling period along won't shrink the total database size if ttl is kept large. Combining the two points above, it seems to be better to let rolling period = ttl. And I think it may simplify the implementation with it, because we know current database will have all the live data, and previous databases are sure to have the old data to be discarded. Thoughts? 4. I assume that {{roll()}} method is going to be processed quickly, right? Otherwise, during the transit state of rolling a database, write performance will degrade somehow. was (Author: zjshen): Jonathan, I've several high questions about the design: bq. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. According to the official [document|https://github.com/google/leveldb], LevelDb a single process (possibly multi-threaded). Therefore, instead of 5 separate (logic) tables, 5 separate databases is used to increase concurrency, isn't it? However, this approach may raise the inconsistency issue. For example, if I upload an entity with primary filter defined, I may run into a scenario that some I/O exception happens when timeline server tries to write into entity db, while the index record is persisted without any problem. In scenario, the entity is searchable by primary filter, but cannot be got by its identifier. bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. If I understand it correct, ownerdb can be treated as the secondary index of domaindb. If we want to lookup for the domains of one owner, we have two steps: 1) get all domain IDs from ownerdb and then 2) pull each individual domains from domaindb. I think we could adopt the similar approach for entitydb and indexdb. Instead of a full copy of entity content in indexdb, we could just record the entity identifier there, and do two-step lookup to answer the query. By doing this, we should be able to significantly shrink indexdb size, and improve write performance. In contrast, the previous leveldb index implementation seems to optimize towards the query. 3. I'm wondering if we need a separate configuration of rolling period or we should use ttl as the rolling period. The reason is if we set ttl smaller than the rolling period, in the most recent database, there will still exist the old data. Therefore, we still need the deletion thread to re
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484374#comment-14484374 ] Jian He commented on YARN-3055: --- bq. does it remove tokens from data structures in all cases or can a token get left in allTokens? I think it does not have a leak. removeFailedDelegationToken will remove the token if renew fails. If there's a leak, the leak exists before YARN-2704. bq. The renewer looks like it may turn into a DOS weapon. It does seem odd to get the expiration date by renewing the token. But there's just currently no way to get the expiration date other than the renew method. bq. Any sub-job with the default of canceling tokens will kill the overall workflow. Am I missing something ? I think currently the sub-job won't kill the overall workflow. the sub-job flag will be ignored, if the first job sets the flag. Overall, I think overall the current patch will work, other than few comments I have. [~daryn], you mentioned you have another patch. could you share the patch or you think the current patch is fine ? > The token is not renewed properly if it's shared by jobs (oozie) in > DelegationTokenRenewer > -- > > Key: YARN-3055 > URL: https://issues.apache.org/jira/browse/YARN-3055 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Blocker > Attachments: YARN-3055.001.patch, YARN-3055.002.patch > > > After YARN-2964, there is only one timer to renew the token if it's shared by > jobs. > In {{removeApplicationFromRenewal}}, when going to remove a token, and the > token is shared by other jobs, we will not cancel the token. > Meanwhile, we should not cancel the _timerTask_, also we should not remove it > from {{allTokens}}. Otherwise for the existing submitted applications which > share this token will not get renew any more, and for new submitted > applications which share this token, the token will be renew immediately. > For example, we have 3 applications: app1, app2, app3. And they share the > token1. See following scenario: > *1).* app1 is submitted firstly, then app2, and then app3. In this case, > there is only one token renewal timer for token1, and is scheduled when app1 > is submitted > *2).* app1 is finished, then the renewal timer is cancelled. token1 will not > be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484368#comment-14484368 ] Craig Welch commented on YARN-3319: --- Apply after applying YARN-3318 and YARN-3463 > Implement a FairOrderingPolicy > -- > > Key: YARN-3319 > URL: https://issues.apache.org/jira/browse/YARN-3319 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3319.13.patch, YARN-3319.14.patch, > YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, > YARN-3319.45.patch, YARN-3319.47.patch > > > Implement a FairOrderingPolicy which prefers to allocate to > SchedulerProcesses with least current usage, very similar to the > FairScheduler's FairSharePolicy. > The Policy will offer allocations to applications in a queue in order of > least resources used, and preempt applications in reverse order (from most > resources used). This will include conditional support for sizeBasedWeight > style adjustment > Optionally, based on a conditional configuration to enable sizeBasedWeight > (default false), an adjustment to boost larger applications (to offset the > natural preference for smaller applications) will adjust the resource usage > value based on demand, dividing it by the below value: > Math.log1p(app memory demand) / Math.log(2); > In cases where the above is indeterminate (two applications are equal after > this comparison), behavior falls back to comparison based on the application > id, which is generally lexically FIFO for that comparison -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.50.patch Must apply YARN-3318 patch first > Integrate OrderingPolicy Framework with CapacityScheduler > - > > Key: YARN-3463 > URL: https://issues.apache.org/jira/browse/YARN-3463 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3463.50.patch > > > Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484363#comment-14484363 ] Zhijie Shen commented on YARN-3448: --- Jonathan, I've several high questions about the design: bq. Split the 5 sections of the leveldb database (domain, owner, start time, entity, index) into 5 separate databases. According to the official [document|https://github.com/google/leveldb], LevelDb a single process (possibly multi-threaded). Therefore, instead of 5 separate (logic) tables, 5 separate databases is used to increase concurrency, isn't it? However, this approach may raise the inconsistency issue. For example, if I upload an entity with primary filter defined, I may run into a scenario that some I/O exception happens when timeline server tries to write into entity db, while the index record is persisted without any problem. In scenario, the entity is searchable by primary filter, but cannot be got by its identifier. bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two sections 4:1 ration (index to entity) at least for tez. If I understand it correct, ownerdb can be treated as the secondary index of domaindb. If we want to lookup for the domains of one owner, we have two steps: 1) get all domain IDs from ownerdb and then 2) pull each individual domains from domaindb. I think we could adopt the similar approach for entitydb and indexdb. Instead of a full copy of entity content in indexdb, we could just record the entity identifier there, and do two-step lookup to answer the query. By doing this, we should be able to significantly shrink indexdb size, and improve write performance. In contrast, the previous leveldb index implementation seems to optimize towards the query. 3. I'm wondering if we need a separate configuration of rolling period or we should use ttl as the rolling period. The reason is if we set ttl smaller than the rolling period, in the most recent database, there will still exist the old data. Therefore, we still need the deletion thread to remove these entities/index entries, or the query has to exclude them from result set. On the other side, it may be also not good to set ttl greater than rolling period. This is because if period now is smaller than ttl, we still need to wait until ttl to delete the database. Therefore, setting small rolling period along won't shrink the total database size if ttl is kept large. Combined the two points above, it seems to be better to letter period = ttl. And I think it may simplify the implementation with it, because we know current database will have all the live data, and previous databases are sure to have the old data to be discarded. Thoughts? 4. I assume that {{roll()}} method is going to be processed quickly, right? Otherwise, during the transit state of rolling a database, write performance will degrade somehow. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times
[jira] [Created] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
Craig Welch created YARN-3463: - Summary: Integrate OrderingPolicy Framework with CapacityScheduler Key: YARN-3463 URL: https://issues.apache.org/jira/browse/YARN-3463 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Craig Welch Assignee: Craig Welch Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484347#comment-14484347 ] Allen Wittenauer commented on YARN-2980: Or, you could go look at YARN-3375 . > Move health check script related functionality to hadoop-common > --- > > Key: YARN-2980 > URL: https://issues.apache.org/jira/browse/YARN-2980 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Varun Saxena > Fix For: 3.0.0 > > Attachments: YARN-2980.001.patch, YARN-2980.002.patch, > YARN-2980.003.patch, YARN-2980.004.patch > > > HDFS might want to leverage health check functionality available in YARN in > both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode > https://issues.apache.org/jira/browse/HDFS-7441. > We can move health check functionality including the protocol between hadoop > daemons and health check script to hadoop-common. That will simplify the > development and maintenance for both hadoop source code and health check > script. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved YARN-2980. Resolution: Fixed > Move health check script related functionality to hadoop-common > --- > > Key: YARN-2980 > URL: https://issues.apache.org/jira/browse/YARN-2980 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Varun Saxena > Fix For: 3.0.0 > > Attachments: YARN-2980.001.patch, YARN-2980.002.patch, > YARN-2980.003.patch, YARN-2980.004.patch > > > HDFS might want to leverage health check functionality available in YARN in > both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode > https://issues.apache.org/jira/browse/HDFS-7441. > We can move health check functionality including the protocol between hadoop > daemons and health check script to hadoop-common. That will simplify the > development and maintenance for both hadoop source code and health check > script. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484346#comment-14484346 ] Li Lu commented on YARN-3459: - [~vvasudev] feel free to take it or directly patch it back to YARN-2901. Thanks! > TestLog4jWarningErrorMetricsAppender breaks in trunk > > > Key: YARN-3459 > URL: https://issues.apache.org/jira/browse/YARN-3459 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Fix For: 2.7.0 > > > TestLog4jWarningErrorMetricsAppender fails with the following message: > {code} > Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< > FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) > Time elapsed: 2.01 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reopened YARN-2980: -- > Move health check script related functionality to hadoop-common > --- > > Key: YARN-2980 > URL: https://issues.apache.org/jira/browse/YARN-2980 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Varun Saxena > Fix For: 3.0.0 > > Attachments: YARN-2980.001.patch, YARN-2980.002.patch, > YARN-2980.003.patch, YARN-2980.004.patch > > > HDFS might want to leverage health check functionality available in YARN in > both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode > https://issues.apache.org/jira/browse/HDFS-7441. > We can move health check functionality including the protocol between hadoop > daemons and health check script to hadoop-common. That will simplify the > development and maintenance for both hadoop source code and health check > script. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484335#comment-14484335 ] Wangda Tan commented on YARN-3459: -- I can reproduce this locally as well, [~vvasudev], do you have any ideas on this? > TestLog4jWarningErrorMetricsAppender breaks in trunk > > > Key: YARN-3459 > URL: https://issues.apache.org/jira/browse/YARN-3459 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Fix For: 2.7.0 > > > TestLog4jWarningErrorMetricsAppender fails with the following message: > {code} > Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< > FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) > Time elapsed: 2.01 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484338#comment-14484338 ] Zhijie Shen commented on YARN-3044: --- Naga, thanks for the patch. Will take a look at the patch. > [Event producers] Implement RM writing app lifecycle events to ATS > -- > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch > > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common
[ https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484334#comment-14484334 ] Junping Du commented on YARN-2980: -- What does "Abey khali" means? {code} +if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) { + LOG.info("Abey khali"); + return null; +} {code} If meaningless, I will reopen this JIRA until we have a fix. > Move health check script related functionality to hadoop-common > --- > > Key: YARN-2980 > URL: https://issues.apache.org/jira/browse/YARN-2980 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Varun Saxena > Fix For: 3.0.0 > > Attachments: YARN-2980.001.patch, YARN-2980.002.patch, > YARN-2980.003.patch, YARN-2980.004.patch > > > HDFS might want to leverage health check functionality available in YARN in > both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode > https://issues.apache.org/jira/browse/HDFS-7441. > We can move health check functionality including the protocol between hadoop > daemons and health check script to hadoop-common. That will simplify the > development and maintenance for both hadoop source code and health check > script. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken
[ https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484320#comment-14484320 ] Hudson commented on YARN-3429: -- FAILURE: Integrated in Hadoop-trunk-Commit #7525 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7525/]) YARN-3429. Fix incorrect CHANGES.txt (rkanter: rev 5b8a3ae366294aec492f69f1a429aa7fce5d13be) * hadoop-yarn-project/CHANGES.txt > TestAMRMTokens.testTokenExpiry fails Intermittently with error > message:Invalid AMRMToken > > > Key: YARN-3429 > URL: https://issues.apache.org/jira/browse/YARN-3429 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.8.0 > > Attachments: YARN-3429.000.patch > > > TestAMRMTokens.testTokenExpiry fails Intermittently with error > message:Invalid AMRMToken from appattempt_1427804754787_0001_01 > The error logs is at > https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R reassigned YARN-3462: --- Assignee: Naganarasimha G R > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3326: Attachment: YARN-3326.20150407-1.patch Have modified the patch assuming that "/label-mappings" is good enough ... please feedback ! > ReST support for getLabelsToNodes > -- > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484295#comment-14484295 ] Jonathan Eagles commented on YARN-3448: --- [~jlowe], addressed you comments. One thing to note is that the entity based read write lock I didn't add back in. Theoretically this is possible for applications that have multiple writers to update to both get and set the start times for an entity or related entity. For applications like Tez (one writer) this is not possible AFAIK. It probably isn't a huge over head, I just haven't had the time to benchmark before and after with entity locking. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.2.patch > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484265#comment-14484265 ] Naganarasimha G R commented on YARN-3044: - Hi [~zjshen], [~djp] & [~sjlee0], Can anyone of you have a look at my last patch ? > [Event producers] Implement RM writing app lifecycle events to ATS > -- > > Key: YARN-3044 > URL: https://issues.apache.org/jira/browse/YARN-3044 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Naganarasimha G R > Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch > > > Per design in YARN-2928, implement RM writing app lifecycle events to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484201#comment-14484201 ] Daryn Sharp commented on YARN-3055: --- This appears to go back to the really old days of renewing the token for its entire lifetime. Most unfortunate. The renewer looks like it may turn into a DOS weapon. Renewing a token returns the next expiration. The renewer uses a timer to renew 90% before expiration. After the last renewal, the same expiration ("the wall") will be returned as before. 90% of "the wall" eventually becomes a rapid fire renewal. There's an army of 50 threads prepared to fire concurrently. My other concern is that it used to be the first job submitted with a given token that determined if the token is to be cancelled. Now any job can influence the cancelling. This patch didn't specifically break that behavior, but the original YARN-2704 did, which precipitated YARN-2964 to break it differently, and now this jira. The ramification is we used to tell users to make sure the first job set the conf correctly, and essentially don't worry after that. Now they do have to worry. Any sub-job with the default of canceling tokens will kill the overall workflow. Sub-jobs should not have jurisdiction over the tokens. > The token is not renewed properly if it's shared by jobs (oozie) in > DelegationTokenRenewer > -- > > Key: YARN-3055 > URL: https://issues.apache.org/jira/browse/YARN-3055 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Blocker > Attachments: YARN-3055.001.patch, YARN-3055.002.patch > > > After YARN-2964, there is only one timer to renew the token if it's shared by > jobs. > In {{removeApplicationFromRenewal}}, when going to remove a token, and the > token is shared by other jobs, we will not cancel the token. > Meanwhile, we should not cancel the _timerTask_, also we should not remove it > from {{allTokens}}. Otherwise for the existing submitted applications which > share this token will not get renew any more, and for new submitted > applications which share this token, the token will be renew immediately. > For example, we have 3 applications: app1, app2, app3. And they share the > token1. See following scenario: > *1).* app1 is submitted firstly, then app2, and then app3. In this case, > there is only one token renewal timer for token1, and is scheduled when app1 > is submitted > *2).* app1 is finished, then the renewal timer is cancelled. token1 will not > be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484117#comment-14484117 ] Daryn Sharp commented on YARN-3055: --- On cursory glance, are you sure this isn't going to leak tokens? Ie. does it remove tokens from data structures in all cases or can a token get left in allTokens? > The token is not renewed properly if it's shared by jobs (oozie) in > DelegationTokenRenewer > -- > > Key: YARN-3055 > URL: https://issues.apache.org/jira/browse/YARN-3055 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Blocker > Attachments: YARN-3055.001.patch, YARN-3055.002.patch > > > After YARN-2964, there is only one timer to renew the token if it's shared by > jobs. > In {{removeApplicationFromRenewal}}, when going to remove a token, and the > token is shared by other jobs, we will not cancel the token. > Meanwhile, we should not cancel the _timerTask_, also we should not remove it > from {{allTokens}}. Otherwise for the existing submitted applications which > share this token will not get renew any more, and for new submitted > applications which share this token, the token will be renew immediately. > For example, we have 3 applications: app1, app2, app3. And they share the > token1. See following scenario: > *1).* app1 is submitted firstly, then app2, and then app3. In this case, > there is only one token renewal timer for token1, and is scheduled when app1 > is submitted > *2).* app1 is finished, then the renewal timer is cancelled. token1 will not > be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer
[ https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484098#comment-14484098 ] Vinod Kumar Vavilapalli commented on YARN-3055: --- [~daryn]/[~jianhe], I briefly looked at the existing patch on this JIRA and it seems like it will work. Can you also take a look? [~hitliuyi], can you see if you can add a test for this in TestDelegationTokenRenewer.java? This is the last blocker on 2.7.0 as of today. Appreciate all the help I can get, thanks all. > The token is not renewed properly if it's shared by jobs (oozie) in > DelegationTokenRenewer > -- > > Key: YARN-3055 > URL: https://issues.apache.org/jira/browse/YARN-3055 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Blocker > Attachments: YARN-3055.001.patch, YARN-3055.002.patch > > > After YARN-2964, there is only one timer to renew the token if it's shared by > jobs. > In {{removeApplicationFromRenewal}}, when going to remove a token, and the > token is shared by other jobs, we will not cancel the token. > Meanwhile, we should not cancel the _timerTask_, also we should not remove it > from {{allTokens}}. Otherwise for the existing submitted applications which > share this token will not get renew any more, and for new submitted > applications which share this token, the token will be renew immediately. > For example, we have 3 applications: app1, app2, app3. And they share the > token1. See following scenario: > *1).* app1 is submitted firstly, then app2, and then app3. In this case, > there is only one token renewal timer for token1, and is scheduled when app1 > is submitted > *2).* app1 is finished, then the renewal timer is cancelled. token1 will not > be renewed any more, but app2 and app3 still use it, so there is problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode
[ https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484058#comment-14484058 ] Sidharta Seethana commented on YARN-2424: - Here it is : https://issues.apache.org/jira/browse/YARN-3462 > LCE should support non-cgroups, non-secure mode > --- > > Key: YARN-2424 > URL: https://issues.apache.org/jira/browse/YARN-2424 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1 >Reporter: Allen Wittenauer >Assignee: Allen Wittenauer >Priority: Blocker > Fix For: 2.6.0 > > Attachments: Y2424-1.patch, YARN-2424.patch > > > After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios. > This is a fairly serious regression, as turning on LCE prior to turning on > full-blown security is a fairly standard procedure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
Sidharta Seethana created YARN-3462: --- Summary: Patches applied for YARN-2424 are inconsistent between trunk and branch-2 Key: YARN-3462 URL: https://issues.apache.org/jira/browse/YARN-3462 Project: Hadoop YARN Issue Type: Bug Reporter: Sidharta Seethana It looks like the changes for YARN-2424 are not the same for trunk (commit 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-3426: -- Target Version/s: 2.8.0 (was: 2.7.0) bq. The bigger question is the duplication of the maven code across Common, YARN and MAPREDUCE. But this may take more time to cleanup. Removing it from 2.7.0 as the effort needed for this cleanup is huge. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, > YARN-3426-040715.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484051#comment-14484051 ] Vinod Kumar Vavilapalli commented on YARN-3426: --- Comments on the patch - The API links are broken. Changing it to apidocs works. - Many private APIs are showing up in javadoc. For e.g., ContainerReport.newInstance is private but it shows up in jdiff and API docs. The bigger question is the duplication of the maven code across Common, YARN and MAPREDUCE. But this may take more time to cleanup. I'll use the output from this patch to figure out compatibility issues with 2.7.0, but remove this patch itself from 2.7.0. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, > YARN-3426-040715.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484032#comment-14484032 ] Zhijie Shen commented on YARN-3448: --- Jonathan, thanks for your contribution. It sounds an interesting proposal. I'd like to take a look at the patch too. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI
[ https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484030#comment-14484030 ] Sidharta Seethana commented on YARN-2901: - IMHO, using calls to LOG.error()/LOG.warn() as proxies for counting errors/warnings is flaky at best. It places cumbersome restrictions on code requiring that a given error/warning correspond to a single error()/warn() call. This is tough to enforce even within a single block of code, let alone across multiple functions ( e.g when an exception is thrown/re-thrown and an error/warning logged in multiple locations ). I hope this will not lead to a restriction on new code in YARN that error/warning should corresponding a single error()/warn() call. > Add errors and warning metrics page to RM, NM web UI > > > Key: YARN-2901 > URL: https://issues.apache.org/jira/browse/YARN-2901 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: Exception collapsed.png, Exception expanded.jpg, Screen > Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, > apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, > apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch > > > It would be really useful to have statistics on the number of errors and > warnings in the RM and NM web UI. I'm thinking about - > 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day > 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 > hours/day > By errors and warnings I'm referring to the log level. > I suspect we can probably achieve this by writing a custom appender?(I'm open > to suggestions on alternate mechanisms for implementing this). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484008#comment-14484008 ] Li Lu commented on YARN-3426: - Could not reproduce the mvn eclipse:eclipse failure locally. The failure looks to be irrelevant. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, > YARN-3426-040715.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483874#comment-14483874 ] Inigo Goiri commented on YARN-3458: --- For the tests, I checked the original TestWindowsBasedProcessTree and it didn't have related to actually testing the resource monitoring; I'm open to suggestions. Regarding the two warning, I'm not able to understand what this is complaining about; it says that I have fields not accessed but the ones I added are referenced. I think ti refers to Log but I'm not able to parse the error. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved YARN-3439. -- Resolution: Duplicate bq. IAC, this is a dup of YARN-3055. Agreed, closing as a duplicate. > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > Attachments: YARN-3439.001.patch > > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups
[ https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483861#comment-14483861 ] Jason Lowe commented on YARN-3452: -- The extra lookups started in 2.6 releases, and it appears to be caused by HADOOP-10650. However YARN really should not be using bogus users on tokens anyway in case the RPC layer (or other non-YARN systems) try to do something with those users like HADOOP-10650 did. > Bogus token usernames cause many invalid group lookups > -- > > Key: YARN-3452 > URL: https://issues.apache.org/jira/browse/YARN-3452 > Project: Hadoop YARN > Issue Type: Bug > Components: security >Reporter: Jason Lowe > > YARN uses a number of bogus usernames for tokens, like application attempt > IDs for NM tokens or even the hardcoded "testing" for the container localizer > token. These tokens cause the RPC layer to do group lookups on these bogus > usernames which will never succeed but can take a long time to perform. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana updated YARN-3366: Attachment: YARN-3366.003.patch Uploading patch incorporating code review feedback. > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch, > YARN-3366.003.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483837#comment-14483837 ] Hadoop QA commented on YARN-3426: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723684/YARN-3426-040715.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7247//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7247//console This message is automatically generated. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, > YARN-3426-040715.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483827#comment-14483827 ] Hadoop QA commented on YARN-3460: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723668/HADOOP-11810-1.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1148 javac compiler warnings (more than the trunk's current 209 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 43 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7245//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7245//console This message is automatically generated. > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 3.0.0, 2.6.0 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483822#comment-14483822 ] Hadoop QA commented on YARN-3458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723681/YARN-3458-3.patch against trunk revision d27e924. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7246//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7246//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7246//console This message is automatically generated. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483812#comment-14483812 ] Zhijie Shen commented on YARN-3391: --- bq. let's continue the discussion on a separated JIRA for figuring it out later. Agree. Let's unblock this Jira which will unblock the writer implementation consequently. I filed YARN-3461 to continue the defaults discussion there. bq. I just wanted to add my 2 cents that this is something we already see and experience with hRaven so it's not theoretical. Sangjin, thanks for sharing the use case in hRaven. It's helpful to understand the proper defaults. To generalize it, we need to consider different use cases such as adhoc applications only. Shall we continue the discussion on YARN-3461? bq. As I mentioned earlier, it should be useful for developers I make use of Sangjin's previous comments to add some inline code comments about their definitions in TimelineCollectorContext. > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483808#comment-14483808 ] Vinod Kumar Vavilapalli commented on YARN-3361: --- Review of the tests - testNonExclusiveNodeLabelsAllocationIgnoreAppSubmitOrder -- -> testPreferenceOfNeedyAppsTowardsNodePartitions ? -- This doesn't really guarantee if app2 is getting preference or not. How about changing it to say app2 has enough requests to fill the entire node? - testNonExclusiveNodeLabelsAllocationIgnorePriority -- -> testPreferenceOfNeedyContainersTowardsNodePartitions ? -- Actually, now that I rename it that way, this may not be the right behavior. Not respecting priorities within an app can result in scheduling deadlocks. - testLabeledResourceRequestsGetPreferrenceInHierarchyOfQueue: This is really testQueuesWithAccessGetPreferrenceInPartitionedNodes? - testNonLabeledQueueUsesLabeledResource -- -> testQueuesWithoutAccessUsingPartitionedNodes -- Also validate that the wait for non-labeled requests not getting allocated on non-partitioned nodes is only for one cycle through all nodes in the cluster - Let's move all these node-label related tests into their own test-case. - More tests? -- AMs with labeled requirement not getting allocated on non-exclusive partitions -- To verify that we are not putting absolute max-capacities on the individual queues when not-respecting-partitions > CapacityScheduler side changes to support non-exclusive node labels > --- > > Key: YARN-3361 > URL: https://issues.apache.org/jira/browse/YARN-3361 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3361.1.patch, YARN-3361.2.patch > > > According to design doc attached in YARN-3214, we need implement following > logic in CapacityScheduler: > 1) When allocate a resource request with no node-label specified, it should > get preferentially allocated to node without labels. > 2) When there're some available resource in a node with label, they can be > used by applications with following order: > - Applications under queues which can access the label and ask for same > labeled resource. > - Applications under queues which can access the label and ask for > non-labeled resource. > - Applications under queues cannot access the label and ask for non-labeled > resource. > 3) Expose necessary information that can be used by preemption policy to make > preemption decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3461) Consolidate flow name/version/run defaults
Zhijie Shen created YARN-3461: - Summary: Consolidate flow name/version/run defaults Key: YARN-3461 URL: https://issues.apache.org/jira/browse/YARN-3461 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3391, it's not resolved what should be the defaults for flow name/version/run. Let's continue the discussion here and unblock YARN-3391 from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
[ https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483779#comment-14483779 ] Li Lu commented on YARN-3459: - Reproduced this failure on my local machine as well as Jenkins run for YARN-3426. Seems like the test failure was introduced by YARN-2901. [~wangda][~vvasudev] can anyone of you take a look at it? Thanks! > TestLog4jWarningErrorMetricsAppender breaks in trunk > > > Key: YARN-3459 > URL: https://issues.apache.org/jira/browse/YARN-3459 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Fix For: 2.7.0 > > > TestLog4jWarningErrorMetricsAppender fails with the following message: > {code} > Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< > FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender > testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) > Time elapsed: 2.01 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<1> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3391: -- Attachment: YARN-3391.3.patch > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3426: Attachment: YARN-3426-040715.patch Added license information to the four .xml API files. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, > YARN-3426-040715.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483714#comment-14483714 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt > Improve web UI to facilitate scheduling analysis and debugging > -- > > Key: YARN-3273 > URL: https://issues.apache.org/jira/browse/YARN-3273 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Rohith > Fix For: 2.7.0 > > Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, > 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, > 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, > YARN-3273-am-resource-used-AND-User-limit.PNG, > YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG > > > Job may be stuck for reasons such as: > - hitting queue capacity > - hitting user-limit, > - hitting AM-resource-percentage > The first queueCapacity is already shown on the UI. > We may surface things like: > - what is user's current usage and user-limit; > - what is the AM resource usage and limit; > - what is the application's current HeadRoom; > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483716#comment-14483716 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java * hadoop-yarn-project/CHANGES.txt > LCE should blacklist based upon group > - > > Key: YARN-2429 > URL: https://issues.apache.org/jira/browse/YARN-2429 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Allen Wittenauer > > It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-3.patch Git and I are getting through a rough relation, let's see if now... > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483698#comment-14483698 ] Jason Lowe commented on YARN-3439: -- I believe it is setting that to false, as that behavior hasn't changed on the Oozie side. However this isn't an issue of the token being cancelled but rather expiring. The RM properly avoids cancelling the token when the launcher job exits, but it then forgets to keep renewing it as well. Eventually the token expires and downstream jobs fail (if they run long enough). > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > Attachments: YARN-3439.001.patch > > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483693#comment-14483693 ] Jian He commented on YARN-3361: --- Some comments on my side - should treat each limit differently for different labeled requests? {code} // Otherwise, if any of the label of this node beyond queue limit, we // cannot allocate on this node. Consider a small epsilon here. {code} - Merge queue#needResource and application#needResource - needResource -> hasPendingResourceRequest; needResource can also be simplified if pass in partionToAllocate - Some methods like canAssignToThisQueue where both nodeLabels and exclusiveType are passed, it may be simplified by passing the current partitionToAllocate to simplify the internal if/else check. - The following may be incorrect, as the current request may be not the AM container request, though null == rmAppAttempt.getMasterContainer() {code} // AM container allocation doesn't support non-exclusive allocation to // avoid painful of preempt an AM container if {code} - below if/else can be avoided if passing the nodePartition into queueCapacities.getAbsoluteCapacity(nodePartition), {code} if (!nodePartition.equals(RMNodeLabelsManager.NO_LABEL)) { queueCapacity = Resources .max(resourceCalculator, clusterResource, queueCapacity, Resources.multiplyAndNormalizeUp( resourceCalculator, labelManager.getResourceByLabel(nodePartition, clusterResource), queueCapacities.getAbsoluteCapacity(nodePartition), minimumAllocation)); } else { // else there's no label on request, just to use absolute capacity as // capacity for nodes without label queueCapacity = Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager .getResourceByLabel(CommonNodeLabelsManager.NO_LABEL, clusterResource), queueCapacities.getAbsoluteCapacity(), minimumAllocation); } {code} - the second limit won’t be hit? {code} if (exclusiveType == ExclusiveType.EXCLUSIVE) { maxUserLimit = Resources.multiplyAndRoundDown(queueCapacity, userLimitFactor); } else if (exclusiveType == ExclusiveType.NON_EXECLUSIVE) { maxUserLimit = labelManager.getResourceByLabel(nodePartition, clusterResource); } {code} - nonExclusiveSchedulingOpportunities#setCount -> add(Priority) > CapacityScheduler side changes to support non-exclusive node labels > --- > > Key: YARN-3361 > URL: https://issues.apache.org/jira/browse/YARN-3361 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3361.1.patch, YARN-3361.2.patch > > > According to design doc attached in YARN-3214, we need implement following > logic in CapacityScheduler: > 1) When allocate a resource request with no node-label specified, it should > get preferentially allocated to node without labels. > 2) When there're some available resource in a node with label, they can be > used by applications with following order: > - Applications under queues which can access the label and ask for same > labeled resource. > - Applications under queues which can access the label and ask for > non-labeled resource. > - Applications under queues cannot access the label and ask for non-labeled > resource. > 3) Expose necessary information that can be used by preemption policy to make > preemption decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483692#comment-14483692 ] Hadoop QA commented on YARN-3458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723677/YARN-3458-2.patch against trunk revision d27e924. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7244//console This message is automatically generated. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
[ https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran moved HADOOP-11810 to YARN-3460: --- Fix Version/s: (was: 3.0.0) Target Version/s: 2.8.0 (was: 2.6.0) Affects Version/s: (was: 2.6.0) (was: 3.0.0) 3.0.0 2.6.0 Key: YARN-3460 (was: HADOOP-11810) Project: Hadoop YARN (was: Hadoop Common) > Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM > > > Key: YARN-3460 > URL: https://issues.apache.org/jira/browse/YARN-3460 > Project: Hadoop YARN > Issue Type: Test >Affects Versions: 2.6.0, 3.0.0 > Environment: $ mvn -version > Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; > 2014-02-14T11:37:52-06:00) > Maven home: /opt/apache-maven-3.2.1 > Java version: 1.7.0, vendor: IBM Corporation > Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", > family: "unix" >Reporter: pascal oliva > Attachments: HADOOP-11810-1.patch > > > TestSecureRMRegistryOperations failed with JBM IBM JAVA > mvn test -X > -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations > ModuleTotal Failure Error Skipped > - > hadoop-yarn-registry 12 0 12 0 > - > Total 12 0 12 0 > With > javax.security.auth.login.LoginException: Bad JAAS configuration: > unrecognized option: isInitiator > and > Bad JAAS configuration: unrecognized option: storeKey -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes
[ https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483685#comment-14483685 ] Jian He commented on YARN-3439: --- IIUC, isn't this a long-standing issue that Ozzie doesn't set "mapreduce.job.complete.cancel.delegation.tokens" to false for standard MR Job ? according to [here | https://issues.apache.org/jira/browse/YARN-2964?focusedCommentId=14250926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14250926]. Should we set it to false on Ozzie side ? > RM fails to renew token when Oozie launcher leaves before sub-job finishes > -- > > Key: YARN-3439 > URL: https://issues.apache.org/jira/browse/YARN-3439 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: Jason Lowe >Assignee: Daryn Sharp >Priority: Blocker > Attachments: YARN-3439.001.patch > > > When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't > linger waiting for the sub-job to finish. At that point the RM stops > renewing delegation tokens for the launcher job which wreaks havoc on the > sub-job if the sub-job runs long enough for the tokens to expire. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk
Li Lu created YARN-3459: --- Summary: TestLog4jWarningErrorMetricsAppender breaks in trunk Key: YARN-3459 URL: https://issues.apache.org/jira/browse/YARN-3459 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu Priority: Blocker Fix For: 2.7.0 TestLog4jWarningErrorMetricsAppender fails with the following message: {code} Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender) Time elapsed: 2.01 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3426) Add jdiff support to YARN
[ https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483681#comment-14483681 ] Li Lu commented on YARN-3426: - The failed unit test also breaks in trunk. Will file a blocker on this. > Add jdiff support to YARN > - > > Key: YARN-3426 > URL: https://issues.apache.org/jira/browse/YARN-3426 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Blocker > Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch > > > Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs > to YARN as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483660#comment-14483660 ] Hadoop QA commented on YARN-1376: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723661/YARN-1376.2015-04-07.patch against trunk revision 0b5d7d2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7241//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7241//console This message is automatically generated. > NM need to notify the log aggregation status to RM through Node heartbeat > - > > Key: YARN-1376 > URL: https://issues.apache.org/jira/browse/YARN-1376 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, > YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, > YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, > YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch > > > Expose a client API to allow clients to figure if log aggregation is > complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-2.patch Patch based on trunk. Let's see if Jenkins likes it. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch, YARN-3458-2.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2429) LCE should blacklist based upon group
[ https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483666#comment-14483666 ] Hudson commented on YARN-2429: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/]) YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 99b08a748e7b00a58b63330b353902a6da6aae27) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java > LCE should blacklist based upon group > - > > Key: YARN-2429 > URL: https://issues.apache.org/jira/browse/YARN-2429 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Allen Wittenauer > > It should be possible to list a group to ban, not just individual users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483668#comment-14483668 ] Hadoop QA commented on YARN-3293: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723665/apache-yarn-3293.5.patch against trunk revision 0b5d7d2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7242//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7242//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7242//console This message is automatically generated. > Track and display capacity scheduler health metrics in web UI > - > > Key: YARN-3293 > URL: https://issues.apache.org/jira/browse/YARN-3293 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, > apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, > apache-yarn-3293.4.patch, apache-yarn-3293.5.patch > > > It would be good to display metrics that let users know about the health of > the capacity scheduler in the web UI. Today it is hard to get an idea if the > capacity scheduler is functioning correctly. Metrics such as the time for the > last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging
[ https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483664#comment-14483664 ] Hudson commented on YARN-3273: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/]) Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 3fb5abfc87953377f86e06578518801a181d7697) * hadoop-yarn-project/CHANGES.txt > Improve web UI to facilitate scheduling analysis and debugging > -- > > Key: YARN-3273 > URL: https://issues.apache.org/jira/browse/YARN-3273 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He >Assignee: Rohith > Fix For: 2.7.0 > > Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, > 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, > 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, > YARN-3273-am-resource-used-AND-User-limit.PNG, > YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG > > > Job may be stuck for reasons such as: > - hitting queue capacity > - hitting user-limit, > - hitting AM-resource-percentage > The first queueCapacity is already shown on the UI. > We may surface things like: > - what is user's current usage and user-limit; > - what is the AM resource usage and limit; > - what is the application's current HeadRoom; > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Labels: containers metrics windows (was: ) > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483631#comment-14483631 ] Hadoop QA commented on YARN-3458: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723671/YARN-3458-1.patch against trunk revision d27e924. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7243//console This message is automatically generated. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Labels: containers, metrics, windows > Attachments: YARN-3458-1.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483627#comment-14483627 ] Inigo Goiri commented on YARN-3458: --- Not sure if the patch has been created properly as I'm in between a couple versions. I would created based on trunk if this doesn't work. > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Attachments: YARN-3458-1.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows
[ https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated YARN-3458: -- Attachment: YARN-3458-1.patch > CPU resource monitoring in Windows > -- > > Key: YARN-3458 > URL: https://issues.apache.org/jira/browse/YARN-3458 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Affects Versions: 2.7.0 > Environment: Windows >Reporter: Inigo Goiri >Priority: Minor > Attachments: YARN-3458-1.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > The current implementation of getCpuUsagePercent() for > WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to > do it. I reused the CpuTimeTracker using 1 jiffy=1ms. > This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3458) CPU resource monitoring in Windows
Inigo Goiri created YARN-3458: - Summary: CPU resource monitoring in Windows Key: YARN-3458 URL: https://issues.apache.org/jira/browse/YARN-3458 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Affects Versions: 2.7.0 Environment: Windows Reporter: Inigo Goiri Priority: Minor The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to do it. I reused the CpuTimeTracker using 1 jiffy=1ms. This was left open by YARN-3122. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers
[ https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483605#comment-14483605 ] Sidharta Seethana commented on YARN-3366: - Thanks for the review, [~vvasudev] . Responses inline : 1. I'll fix this. This is an artifact of differences between trunk/branch-2 (repeated) 1. I think these are useful log lines that specify change in behavior due to settings/system state etc. I'll clarify/improve the log messages. 2. good catch, I'll fix it. Tests ran fine because WARN logging was enabled. 3. I'll fix the comments' location. The exception used to exist before but was causing bootstrapping issues. I left it in there along with an explanation for why it shouldn't be thrown. I'll remove it and modify comments. 4. Intellij warns me about this too - but I had left it in there for clarity/consistency with the earlier code block - I believe it makes the code a bit more readable. I would prefer to leave it in place. 5. I'll fix this 6. I'll fix this 7. why? compiler optimization? 8. I'll fix this. 9. I'll fix this. 10. I'll fix this. 11. I'll fix this - though I don't believe the merging always helps for error/warn metrics 12. I'll fix this. 13. Not trivially, would refactoring launchContainer. > Outbound network bandwidth : classify/shape traffic originating from YARN > containers > > > Key: YARN-3366 > URL: https://issues.apache.org/jira/browse/YARN-3366 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3366.001.patch, YARN-3366.002.patch > > > In order to be able to isolate based on/enforce outbound traffic bandwidth > limits, we need a mechanism to classify/shape network traffic in the > nodemanager. For more information on the design, please see the attached > design document in the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS
[ https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483588#comment-14483588 ] Junping Du commented on YARN-3046: -- Linked with MAPREDUCE-6189 - the test failure on trunk is solid, not only on my local test bed. > [Event producers] Implement MapReduce AM writing some MR metrics to ATS > --- > > Key: YARN-3046 > URL: https://issues.apache.org/jira/browse/YARN-3046 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Junping Du > Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch > > > Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes > written) and have the MR AM write the framework-specific metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483518#comment-14483518 ] Hudson commented on YARN-3294: -- FAILURE: Integrated in Hadoop-trunk-Commit #7521 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7521/]) YARN-3294. Allow dumping of Capacity Scheduler debug logs via web UI for (xgong: rev d27e9241e8676a0edb2d35453cac5f9495fcd605) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestAdHocLogDumper.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AdHocLogDumper.java > Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time > period > - > > Key: YARN-3294 > URL: https://issues.apache.org/jira/browse/YARN-3294 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Fix For: 2.8.0 > > Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, > apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, > apache-yarn-3294.3.patch, apache-yarn-3294.4.patch > > > It would be nice to have a button on the web UI that would allow dumping of > debug logs for just the capacity scheduler for a fixed period of time(1 min, > 5 min or so) in a separate log file. It would be useful when debugging > scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
[ https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483498#comment-14483498 ] Junping Du commented on YARN-3391: -- Sorry for coming a little late. Thanks guys for good discussions here and [~zjshen] for updating the patch! bq. I just wanted to add my 2 cents that this is something we already see and experience with hRaven so it's not theoretical. +1, [~sjlee0]! I think that's very important feedback for improving user experience for new feature here. Let's try to get a good balance between addressing these solid scenarios as well as providing flexibility to possible new scenarios. e.g. we can provide different flow group policies that user can use to group application into flow by name or keeping them as isolated flow, etc. Anyway, as everyone's agreement so far, let's continue the discussion on a separated JIRA for figuring it out later. The patch looks good in overall. However, I still haven't seen we put definition of "flow", "flow run" and "flow version" in any places of Javadoc. As I mentioned earlier, it should be useful for developers. The official Apache feature doc is more user oriented and we can address it later when feature get completed. > Clearly define flow ID/ flow run / flow version in API and storage > -- > > Key: YARN-3391 > URL: https://issues.apache.org/jira/browse/YARN-3391 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3391.1.patch, YARN-3391.2.patch > > > To continue the discussion in YARN-3040, let's figure out the best way to > describe the flow. > Some key issues that we need to conclude on: > - How do we include the flow version in the context so that it gets passed > into the collector and to the storage eventually? > - Flow run id should be a number as opposed to a generic string? > - Default behavior for the flow run id if it is missing (i.e. client did not > set it) > - How do we handle flow attributes in case of nested levels of flows? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM
[ https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483489#comment-14483489 ] Varun Vasudev commented on YARN-3443: - +1, lgtm for the latest patch. > Create a 'ResourceHandler' subsystem to ease addition of support for new > resource types on the NM > - > > Key: YARN-3443 > URL: https://issues.apache.org/jira/browse/YARN-3443 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana > Attachments: YARN-3443.001.patch, YARN-3443.002.patch, > YARN-3443.003.patch, YARN-3443.004.patch > > > The current cgroups implementation is closely tied to supporting CPU as a > resource . We need to separate out CGroups support as well a provide a simple > ResourceHandler subsystem that will enable us to add support for new resource > types on the NM - e.g Network, Disk etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483490#comment-14483490 ] Xuan Gong commented on YARN-3294: - Committed into trunk/branch-2. Thanks, varun. > Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time > period > - > > Key: YARN-3294 > URL: https://issues.apache.org/jira/browse/YARN-3294 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, > apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, > apache-yarn-3294.3.patch, apache-yarn-3294.4.patch > > > It would be nice to have a button on the web UI that would allow dumping of > debug logs for just the capacity scheduler for a fixed period of time(1 min, > 5 min or so) in a separate log file. It would be useful when debugging > scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period
[ https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483486#comment-14483486 ] Xuan Gong commented on YARN-3294: - +1 lgtm. Will commit > Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time > period > - > > Key: YARN-3294 > URL: https://issues.apache.org/jira/browse/YARN-3294 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, > apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, > apache-yarn-3294.3.patch, apache-yarn-3294.4.patch > > > It would be nice to have a button on the web UI that would allow dumping of > debug logs for just the capacity scheduler for a fixed period of time(1 min, > 5 min or so) in a separate log file. It would be useful when debugging > scheduler behavior without affecting the rest of the resourcemanager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-3293: Attachment: apache-yarn-3293.5.patch The findbug warnings are incorrect - the fields are used by JAXB. Updated patch to exclude them. The failing test is unrelated. > Track and display capacity scheduler health metrics in web UI > - > > Key: YARN-3293 > URL: https://issues.apache.org/jira/browse/YARN-3293 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, > apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, > apache-yarn-3293.4.patch, apache-yarn-3293.5.patch > > > It would be good to display metrics that let users know about the health of > the capacity scheduler in the web UI. Today it is hard to get an idea if the > capacity scheduler is functioning correctly. Metrics such as the time for the > last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1376: Attachment: YARN-1376.2015-04-07.patch Address all the latest comments. > NM need to notify the log aggregation status to RM through Node heartbeat > - > > Key: YARN-1376 > URL: https://issues.apache.org/jira/browse/YARN-1376 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, > YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, > YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, > YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch > > > Expose a client API to allow clients to figure if log aggregation is > complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat
[ https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1376: Attachment: Screen Shot 2015-04-07 at 9.30.42 AM.png > NM need to notify the log aggregation status to RM through Node heartbeat > - > > Key: YARN-1376 > URL: https://issues.apache.org/jira/browse/YARN-1376 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, > YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, > YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, > YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch > > > Expose a client API to allow clients to figure if log aggregation is > complete. The ticket is used to track the changes on NM side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI
[ https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483445#comment-14483445 ] Hadoop QA commented on YARN-3293: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12723649/apache-yarn-3293.4.patch against trunk revision 75c5454. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7239//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7239//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7239//console This message is automatically generated. > Track and display capacity scheduler health metrics in web UI > - > > Key: YARN-3293 > URL: https://issues.apache.org/jira/browse/YARN-3293 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, > apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, > apache-yarn-3293.4.patch > > > It would be good to display metrics that let users know about the health of > the capacity scheduler in the web UI. Today it is hard to get an idea if the > capacity scheduler is functioning correctly. Metrics such as the time for the > last allocation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
[ https://issues.apache.org/jira/browse/YARN-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483432#comment-14483432 ] Junping Du commented on YARN-3431: -- Thanks [~zjshen] for the patch and [~gtCarrera9] for review and comments. bq. However, I'm a little bit confused about the big picture of this patch. I put some contents and background in JIRA description. Hope it helps. {code} -putObjects("entities", params, entitiesContainer); +for (org.apache.hadoop.yarn.api.records.timelineservice.TimelineEntity entity : entities) { + String path = "entities"; + try { +path += "/" + TimelineEntityType.valueOf(entity.getType()).toString(); + } catch (IllegalArgumentException e) { +// Do nothing, generic entity type + } + putObjects(path, params, entity); +} {code} Looks like we are breaking one put operation into pieces. This doesn't make sense in performance prospective. Do we have to do this? BTW, we should handle IllegalArgumentException instead of ignoring it. Isn't it? > Sub resources of timeline entity needs to be passed to a separate endpoint. > --- > > Key: YARN-3431 > URL: https://issues.apache.org/jira/browse/YARN-3431 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3431.1.patch, YARN-3431.2.patch > > > We have TimelineEntity and some other entities as subclass that inherit from > it. However, we only have a single endpoint, which consume TimelineEntity > rather than sub-classes and this endpoint will check the incoming request > body contains exactly TimelineEntity object. However, the json data which is > serialized from sub-class object seems not to be treated as an TimelineEntity > object, and won't be deserialized into the corresponding sub-class object > which cause deserialization failure as some discussions in YARN-3334 : > https://issues.apache.org/jira/browse/YARN-3334?focusedCommentId=14391059&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14391059. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage
[ https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483429#comment-14483429 ] Allen Wittenauer commented on YARN-3348: {code} +doNotSetCols=0 +doNotSetRows=0 +for i in "$@"; do + if [[ $i == "-cols" ]]; then +doNotSetCols=1 + fi + if [[ $i == "-rows" ]]; then +doNotSetRows=1 + fi +done +if [[ $doNotSetCols == 0 ]]; then + cols=`tput cols` + args=( $@ ) + args=("${args[@]}" "-cols" "$cols") + set -- "${args[@]}" +fi +if [[ $doNotSetRows == 0 ]]; then + rows=`tput lines` + args=( $@ ) + args=("${args[@]}" "-rows" "$rows") + set -- "${args[@]}" +fi {code} * Why are we doing this manipulation here and not in the Java code? * backticks are antiquated in modern bash. Use {{$()}} construction * What happens if tput gives you zero or an error because you are on a non-addressable terminal? (You can generally simulate this by unset TERM or equivalent env var) > Add a 'yarn top' tool to help understand cluster usage > -- > > Key: YARN-3348 > URL: https://issues.apache.org/jira/browse/YARN-3348 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch > > > It would be helpful to have a 'yarn top' tool that would allow administrators > to understand which apps are consuming resources. > Ideally the tool would allow you to filter by queue, user, maybe labels, etc > and show you statistics on container allocation across the cluster to find > out which apps are consuming the most resources on the cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)