date:20150407


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484815#comment-14484815
 ] 

Zhijie Shen commented on YARN-3391:
---

I created a new patch:

bq.  So in general, I think we should use as much javadoc comments instead of 
inline comments for public APIs.

Move the comments into TimelineUtils and make them javadoc.

bq. We should add more info to LOG.warn messages, at least to tell user flow 
run should be numeric.

Improve the warn message

bq. In addition, do we need to check negative value for flow run here?

According to Sangjin's given example, we usually want to identify a flow run by 
timestamp, which theoretically can be negative to represent sometime before 
1970.

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch, 
> YARN-3391.4.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called


[ 
https://issues.apache.org/jira/browse/YARN-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484813#comment-14484813
 ] 

Hadoop QA commented on YARN-3457:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723815/YARN-3457.001.patch
  against trunk revision ab04ff9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7252//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7252//console

This message is automatically generated.

> NPE when NodeManager.serviceInit fails and stopRecoveryStore called
> ---
>
> Key: YARN-3457
> URL: https://issues.apache.org/jira/browse/YARN-3457
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-3457.001.patch
>
>
> When NodeManager service init fails during stopRecoveryStore null pointer 
> exception is thrown
> {code}
>  @Override
>   protected void serviceInit(Configuration conf) throws Exception {
>..
>   try {
>   exec.init();
> } catch (IOException e) {
>   throw new YarnRuntimeException("Failed to initialize container 
> executor", e);
> }
> this.context = createNMContext(containerTokenSecretManager,
> nmTokenSecretManager, nmStore);
> 
> {code}
> context is null when service init fails
> {code}
>   private void stopRecoveryStore() throws IOException {
> nmStore.stop();
> if (context.getDecommissioned() && nmStore.canRecover()) {
>..
> }
>   }
> {code}
> Null pointer exception thrown
> {quote}
> 015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service NodeManager : java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


 [ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3391:
--
Attachment: YARN-3391.4.patch

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch, 
> YARN-3391.4.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484798#comment-14484798
 ] 

Hadoop QA commented on YARN-3459:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723835/apache-yarn-3459.0.patch
  against trunk revision ab04ff9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7251//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7251//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7251//console

This message is automatically generated.

> TestLog4jWarningErrorMetricsAppender breaks in trunk
> 
>
> Key: YARN-3459
> URL: https://issues.apache.org/jira/browse/YARN-3459
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: apache-yarn-3459.0.patch
>
>
> TestLog4jWarningErrorMetricsAppender fails with the following message:
> {code}
> Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
> Time elapsed: 2.01 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk

2015-04-07 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3459:

Attachment: apache-yarn-3459.0.patch

My apologies for the failing test. I suspect it's a timing issue since it 
passed the pre-commit builds and is passing on my machine. Can you try the 
attached patch and +1 if it works?

> TestLog4jWarningErrorMetricsAppender breaks in trunk
> 
>
> Key: YARN-3459
> URL: https://issues.apache.org/jira/browse/YARN-3459
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.7.0
>
> Attachments: apache-yarn-3459.0.patch
>
>
> TestLog4jWarningErrorMetricsAppender fails with the following message:
> {code}
> Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
> Time elapsed: 2.01 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484738#comment-14484738
 ] 

Hadoop QA commented on YARN-3326:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723819/YARN-3326.20150408-1.patch
  against trunk revision 4be648b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7250//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7250//console

This message is automatically generated.

> ReST support for getLabelsToNodes 
> --
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484717#comment-14484717
 ] 

Jonathan Eagles commented on YARN-3448:
---

[~zjshen], Interesting idea about index just beings pointers into the entity 
db. I'll have to investigate what the write and read performance implications 
are.

As for rolling period vs ttl. I think rolling period should always be a smaller 
than ttl. One thing to consider is that unlike traditional rolling files, there 
are more than one active at a time. In fact, all rolling dbs from now unto ttl 
may be active. That is due to stitching of data back together on the reads. All 
events for the same entity id will go into the same database.

My current setup includes rolling every hour and a ttl of one day. 

As far as what roll does, it only schedules the db to be deleted and removes 
the old entity and index from being found. This does mean that there will some 
start times associated that are old that are still active. That will get 
eventually consistent once the ttl eviction period finishes.


> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.3.patch

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484668#comment-14484668
 ] 

Naganarasimha G R commented on YARN-3462:
-

thanks for reviewing [~sidharta-s] , yes i have compiled the patch  in Branch2 
and it was compiling fine. 

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
> Attachments: YARN-3462.20150508-1.patch
>
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3326) ReST support for getLabelsToNodes


 [ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3326:

Attachment: YARN-3326.20150408-1.patch

Thanks for reviewing [~ozawa] , have updated the patch with your review comment

> ReST support for getLabelsToNodes 
> --
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3457) NPE when NodeManager.serviceInit fails and stopRecoveryStore called

2015-04-07 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3457:
---
Attachment: YARN-3457.001.patch

> NPE when NodeManager.serviceInit fails and stopRecoveryStore called
> ---
>
> Key: YARN-3457
> URL: https://issues.apache.org/jira/browse/YARN-3457
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: YARN-3457.001.patch
>
>
> When NodeManager service init fails during stopRecoveryStore null pointer 
> exception is thrown
> {code}
>  @Override
>   protected void serviceInit(Configuration conf) throws Exception {
>..
>   try {
>   exec.init();
> } catch (IOException e) {
>   throw new YarnRuntimeException("Failed to initialize container 
> executor", e);
> }
> this.context = createNMContext(containerTokenSecretManager,
> nmTokenSecretManager, nmStore);
> 
> {code}
> context is null when service init fails
> {code}
>   private void stopRecoveryStore() throws IOException {
> nmStore.stop();
> if (context.getDecommissioned() && nmStore.canRecover()) {
>..
> }
>   }
> {code}
> Null pointer exception thrown
> {quote}
> 015-04-07 17:31:45,807 WARN org.apache.hadoop.service.AbstractService: When 
> stopping the service NodeManager : java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:168)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:280)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:484)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:534)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes

2015-04-07 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484608#comment-14484608
 ] 

Tsuyoshi Ozawa commented on YARN-3326:
--

[~Naganarasimha] thank you for updating. LGTM overall.

Minor nits: let's avoid using * import.
{code}
+import java.util.*;
{code}


> ReST support for getLabelsToNodes 
> --
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3348) Add a 'yarn top' tool to help understand cluster usage


[ 
https://issues.apache.org/jira/browse/YARN-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484548#comment-14484548
 ] 

Jian He commented on YARN-3348:
---

Thanks Varun, some comments:
- “Unable to fetach cluster metrics” - typo
- exceeding 80 Column limit,
{code}
opts   .addOption( "types", true, "Comma separated list of types to 
restrict applications, case sensitive(though the display is lower case)");
{code}
- the -rows, -cols options seems not having effect on my screen when I tried 
it, could you double check ?
- the ‘yarn top’ output is repeatedly showing up on terminal every $delay 
seconds. it’ll be better to only show that only once. 
- Does the patch only show root queue info ? should we show all queues info ?
- “F + Enter : Select sort field” ; may be use ’S’ for sorting ?
- “Memory seconds(in GBseconds” - missing “)”
- It seems a bit odd to have this method in a public API record. Do you know 
why hashcode is not correct without this method ?  Or we can just type cast it 
to GetApplicationsRequestPBImpl and use the method from there.  
{code}
// need this otherwise the hashcode doesn't get generated correctly 
request.initAllFields();
{code}
- for the caching in ClientRMService. Do you think we can do the cache on 
client side ? that’ll save RPCs, especially if we have many top commands 
running on client side.

> Add a 'yarn top' tool to help understand cluster usage
> --
>
> Key: YARN-3348
> URL: https://issues.apache.org/jira/browse/YARN-3348
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: apache-yarn-3348.0.patch, apache-yarn-3348.1.patch
>
>
> It would be helpful to have a 'yarn top' tool that would allow administrators 
> to understand which apps are consuming resources.
> Ideally the tool would allow you to filter by queue, user, maybe labels, etc 
> and show you statistics on container allocation across the cluster to find 
> out which apps are consuming the most resources on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484514#comment-14484514
 ] 

Junping Du commented on YARN-3391:
--

Sorry that the 2nd comment above has format issue and may hard to read. Fix the 
comments as below:
In ClientRMService.java, 
{code}
+// Sanity check for flow run
+try {
+  for (String tag : submissionContext.getApplicationTags()) {
+if (tag.startsWith(TimelineUtils.FLOW_RUN_TAG_PREFIX + ":") ||
+tag.startsWith(
+TimelineUtils.FLOW_RUN_TAG_PREFIX.toLowerCase() + ":")) {
+  String value =
+  tag.substring(TimelineUtils.FLOW_RUN_TAG_PREFIX.length() + 1);
+  Long.valueOf(value);
+}
+  }
+} catch (NumberFormatException e) {
+  LOG.warn("Invalid to flow run.", e);
+  RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
+  e.getMessage(), "ClientRMService",
+  "Exception in submitting application", applicationId);
+  throw RPCUtil.getRemoteException(e);
+}
{code}
We should add more info to LOG.warn messages, at least to tell user flow run 
should be numeric. In addition, do we need to check negative value for flow run 
here? If not, why we are accepting negative long value but rejecting other 
characters than number?

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484512#comment-14484512
 ] 

Junping Du commented on YARN-3391:
--

bq. I make use of Sangjin's previous comments to add some inline code comments 
about their definitions in TimelineCollectorContext.
I would expect the definition can show up in Javadoc of related methods in 
TimelineCollectorContext. This sounds like a little nitpick, but the key 
differences between inline comments and javadoc is if developer only use jar 
instead of source code, they can still read these key definitions and use it 
correctly (by IDE hint or generated Javadoc). So in general, I think we should 
use as much javadoc comments instead of inline comments for public APIs. 

{code}
+// Sanity check for flow run
+try {
+  for (String tag : submissionContext.getApplicationTags()) {
+if (tag.startsWith(TimelineUtils.FLOW_RUN_TAG_PREFIX + ":") ||
+tag.startsWith(
+TimelineUtils.FLOW_RUN_TAG_PREFIX.toLowerCase() + ":")) {
+  String value =
+  tag.substring(TimelineUtils.FLOW_RUN_TAG_PREFIX.length() + 1);
+  Long.valueOf(value);
+}
+  }
+} catch (NumberFormatException e) {
+  LOG.warn("Invalid to flow run.", e);
+  RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
+  e.getMessage(), "ClientRMService",
+  "Exception in submitting application", applicationId);
+  throw RPCUtil.getRemoteException(e);
+}
{cide}
We should add more info to LOG.warn messages, at least to tell user flow run 
should be numeric. In addition, do we need to check negative value for flow run 
here? If not, why we are accepting negative long value but rejecting other 
characters than number?

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484511#comment-14484511
 ] 

Li Lu commented on YARN-3426:
-

Thanks [~vinodkv] for the review! For the second point, I traced into the code 
of our ExcludePrivateAnnotationsJDiffDoclet, and found this may actually be a 
bug for RootDocProcessor. Specifically, we are instrumenting each methods(true) 
call to a Doc entry, but we're not instrumenting methods() calls. methods() 
calls have exactly the same meaning as method(true) according to 
http://docs.oracle.com/javase/7/docs/jdk/api/javadoc/doclet/com/sun/javadoc/ClassDoc.html
 . I'll post a patch to fix this soon. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
> YARN-3426-040715.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2349) InvalidStateTransitonException after RM switch

2015-04-07 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith resolved YARN-2349.
--
Resolution: Cannot Reproduce

Closing the issue as 'cant reproduce'. Feel free to reopen if you find issue in 
latest release or trunk.

> InvalidStateTransitonException after RM switch
> --
>
> Key: YARN-2349
> URL: https://issues.apache.org/jira/browse/YARN-2349
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Nishan Shetty
>Assignee: Rohith
>
> {code}
> 2014-07-23 19:22:28,272 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2014-07-23 19:22:28,273 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 45018: starting
> 2014-07-23 19:22:28,266 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Can't handle 
> this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APP_REJECTED at ACCEPTED
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:635)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:83)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:706)
>  at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:690)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>  at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>  at java.lang.Thread.run(Thread.java:662)
> 2014-07-23 19:22:28,283 INFO org.mortbay.log: Stopped 
> SelectChannelConnector@10.18.40.84:45020
> 2014-07-23 19:22:28,291 ERROR 
> org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
>  Error when openning history file of application 
> application_1406116264351_0007
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes


[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484437#comment-14484437
 ] 

Hadoop QA commented on YARN-3326:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723771/YARN-3326.20150407-1.patch
  against trunk revision bd77a7c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7248//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7248//console

This message is automatically generated.

> ReST support for getLabelsToNodes 
> --
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484436#comment-14484436
 ] 

Sidharta Seethana commented on YARN-3462:
-

[~Naganarasimha], Thanks for the patch. I am assuming the patch application 
failure is because it got applied to trunk.

The patch looks good to me.

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
> Attachments: YARN-3462.20150508-1.patch
>
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3461) Consolidate flow name/version/run defaults


[ 
https://issues.apache.org/jira/browse/YARN-3461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484435#comment-14484435
 ] 

Junping Du commented on YARN-3461:
--

I would agree most of comments in YARN-3391. Like I proposed in that jira, can 
we have a configurable policy to group applications into flow by default if 
user doesn't specify flow name for application? 
For example, assume we have 3 policies can be configured (no matter what is 
default policy): 1. group applications into flow by application name; 2. group 
each applications into isolated flows; 3. group each applications into a single 
default flow (more like test purpose). Developers/users in future can 
choose/extend these policies to meet their scenarios more closely. Thoughts?

> Consolidate flow name/version/run defaults
> --
>
> Key: YARN-3461
> URL: https://issues.apache.org/jira/browse/YARN-3461
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> In YARN-3391, it's not resolved what should be the defaults for flow 
> name/version/run. Let's continue the discussion here and unblock YARN-3391 
> from moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484411#comment-14484411
 ] 

Hadoop QA commented on YARN-3462:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723784/YARN-3462.20150508-1.patch
  against trunk revision 5b8a3ae.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7249//console

This message is automatically generated.

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
> Attachments: YARN-3462.20150508-1.patch
>
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2


[ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484407#comment-14484407
 ] 

Naganarasimha G R commented on YARN-3462:
-

[~sidharta-s] , can you take a look at the patch ?

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
> Attachments: YARN-3462.20150508-1.patch
>
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3462:

Attachment: YARN-3462.20150508-1.patch

Attaching a patch with corrections in  branch2

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
> Attachments: YARN-3462.20150508-1.patch
>
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2980) Move health check script related functionality to hadoop-common


 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reopened YARN-2980:
--

The target version is on 2.x, why we commit to trunk only? This doesn't sounds 
like incompatibility change. Reopen it until we commit it to branch-2 together 
with YARN-3375.

> Move health check script related functionality to hadoop-common
> ---
>
> Key: YARN-2980
> URL: https://issues.apache.org/jira/browse/YARN-2980
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Varun Saxena
> Fix For: 3.0.0
>
> Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
> YARN-2980.003.patch, YARN-2980.004.patch
>
>
> HDFS might want to leverage health check functionality available in YARN in 
> both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
> https://issues.apache.org/jira/browse/HDFS-7441.
> We can move health check functionality including the protocol between hadoop 
> daemons and health check script to hadoop-common. That will simplify the 
> development and maintenance for both hadoop source code and health check 
> script.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

[
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484363#comment-14484363
]

Zhijie Shen edited comment on YARN-3448 at 4/7/15 11:56 PM:

Jonathan, I've several high questions about the design and the implementation:

bq. Split the 5 sections of the leveldb database (domain, owner, start time,
entity, index) into 5 separate databases.

According to the official [document|https://github.com/google/leveldb], LevelDb
a single process (possibly multi-threaded). Therefore, instead of 5 separate
(logic) tables, 5 separate databases is used to increase concurrency, isn't it?

However, this approach may raise the inconsistency issue. For example, if I
upload an entity with primary filter defined, I may run into a scenario that
some I/O exception happens when timeline server tries to write into entity db,
while the index record is persisted without any problem. In scenario, the
entity is searchable by primary filter, but cannot be got by its identifier.

bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two
sections 4:1 ration (index to entity) at least for tez.

If I understand it correct, ownerdb can be treated as the secondary index of
domaindb. If we want to lookup for the domains of one owner, we have two steps:
1) get all domain IDs from ownerdb and then 2) pull each individual domains
from domaindb.

I think we could adopt the similar approach for entitydb and indexdb. Instead
of a full copy of entity content in indexdb, we could just record the entity
identifier there, and do two-step lookup to answer the query. By doing this, we
should be able to significantly shrink indexdb size, and improve write
performance. In contrast, the previous leveldb index implementation seems to
optimize towards the query.

3. I'm wondering if we need a separate configuration of rolling period or we
should use ttl as the rolling period. The reason is if we set ttl smaller than
the rolling period, in the most recent database, there will still exist old
data. Therefore, we still need the deletion thread to remove these
entities/index entries, or the query has to exclude them from result set.

On the other side, it may be also not good to set ttl greater than rolling
period. This is because if period now is smaller than ttl, we still need to
wait until ttl to delete the database. Therefore, setting small rolling period
along won't shrink the total database size if ttl is kept large.

Combining the two points above, it seems to be better to let rolling period =
ttl. And I think it may simplify the implementation with it, because we know
current database will have all the live data, and previous databases are sure
to have the old data to be discarded. Thoughts?

4. I assume that {{roll()}} method is going to be processed quickly, right?
Otherwise, during the transit state of rolling a database, write performance
will degrade somehow.

was (Author: zjshen):
Jonathan, I've several high questions about the design:

bq. Split the 5 sections of the leveldb database (domain, owner, start time,
entity, index) into 5 separate databases.

bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two
sections 4:1 ration (index to entity) at least for tez.

3. I'm wondering if we need a separate configuration of rolling period or we
should use ttl as the rolling period. The reason is if we set ttl smaller than
the rolling period, in the most recent database, there will still exist the old
data. Therefore, we still need the deletion thread to re

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484374#comment-14484374
 ] 

Jian He commented on YARN-3055:
---

bq.  does it remove tokens from data structures in all cases or can a token get 
left in allTokens?
I think it does not have a leak. removeFailedDelegationToken will remove the 
token if renew fails. If there's a leak, the leak exists before  YARN-2704.
bq. The renewer looks like it may turn into a DOS weapon.
It does seem odd to get the expiration date by renewing the token. But there's 
just currently no way to get the expiration date other than the renew method.
bq. Any sub-job with the default of canceling tokens will kill the overall 
workflow.
Am I missing something ? I think currently  the sub-job won't kill the overall 
workflow. the sub-job flag will be ignored, if the first job sets the flag.

Overall, I think overall the current patch will work, other than few comments I 
have.
[~daryn], you mentioned you have another patch. could you share the patch or 
you think the current patch is fine ?

> The token is not renewed properly if it's shared by jobs (oozie) in 
> DelegationTokenRenewer
> --
>
> Key: YARN-3055
> URL: https://issues.apache.org/jira/browse/YARN-3055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Blocker
> Attachments: YARN-3055.001.patch, YARN-3055.002.patch
>
>
> After YARN-2964, there is only one timer to renew the token if it's shared by 
> jobs. 
> In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
> token is shared by other jobs, we will not cancel the token. 
> Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
> from {{allTokens}}. Otherwise for the existing submitted applications which 
> share this token will not get renew any more, and for new submitted 
> applications which share this token, the token will be renew immediately.
> For example, we have 3 applications: app1, app2, app3. And they share the 
> token1. See following scenario:
> *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
> there is only one token renewal timer for token1, and is scheduled when app1 
> is submitted
> *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
> be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3319) Implement a FairOrderingPolicy

2015-04-07 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484368#comment-14484368
 ] 

Craig Welch commented on YARN-3319:
---

Apply after applying YARN-3318 and YARN-3463

> Implement a FairOrderingPolicy
> --
>
> Key: YARN-3319
> URL: https://issues.apache.org/jira/browse/YARN-3319
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3319.13.patch, YARN-3319.14.patch, 
> YARN-3319.17.patch, YARN-3319.35.patch, YARN-3319.39.patch, 
> YARN-3319.45.patch, YARN-3319.47.patch
>
>
> Implement a FairOrderingPolicy which prefers to allocate to 
> SchedulerProcesses with least current usage, very similar to the 
> FairScheduler's FairSharePolicy.  
> The Policy will offer allocations to applications in a queue in order of 
> least resources used, and preempt applications in reverse order (from most 
> resources used). This will include conditional support for sizeBasedWeight 
> style adjustment
> Optionally, based on a conditional configuration to enable sizeBasedWeight 
> (default false), an adjustment to boost larger applications (to offset the 
> natural preference for smaller applications) will adjust the resource usage 
> value based on demand, dividing it by the below value:
> Math.log1p(app memory demand) / Math.log(2);
> In cases where the above is indeterminate (two applications are equal after 
> this comparison), behavior falls back to comparison based on the application 
> id, which is generally lexically FIFO for that comparison



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-07 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.50.patch

Must apply YARN-3318 patch first

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484363#comment-14484363
 ] 

Zhijie Shen commented on YARN-3448:
---

Jonathan, I've several high questions about the design:

bq. Split the 5 sections of the leveldb database (domain, owner, start time, 
entity, index) into 5 separate databases.

According to the official [document|https://github.com/google/leveldb], LevelDb 
a single process (possibly multi-threaded). Therefore, instead of 5 separate 
(logic) tables, 5 separate databases is used to increase concurrency, isn't it? 

However, this approach may raise the inconsistency issue. For example, if I 
upload an entity with primary filter defined, I may run into a scenario that 
some I/O exception happens when timeline server tries to write into entity db, 
while the index record is persisted without any problem. In scenario, the 
entity is searchable by primary filter, but cannot be got by its identifier.

bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
sections 4:1 ration (index to entity) at least for tez.

If I understand it correct, ownerdb can be treated as the secondary index of 
domaindb. If we want to lookup for the domains of one owner, we have two steps: 
1) get all domain IDs from ownerdb and then 2) pull each individual domains 
from domaindb.

I think we could adopt the similar approach for entitydb and indexdb. Instead 
of a full copy of entity content in indexdb, we could just record the entity 
identifier there, and do two-step lookup to answer the query. By doing this, we 
should be able to significantly shrink indexdb size, and improve write 
performance. In contrast, the previous leveldb index implementation seems to 
optimize towards the query.

3. I'm wondering if we need a separate configuration of rolling period or we 
should use ttl as the rolling period. The reason is if we set ttl smaller than 
the rolling period, in the most recent database, there will still exist the old 
data. Therefore, we still need the deletion thread to remove these 
entities/index entries, or the query has to exclude them from result set.

On the other side, it may be also not good to set ttl greater than rolling 
period. This is because if period now is smaller than ttl, we still need to 
wait until ttl to delete the database. Therefore, setting small rolling period 
along won't shrink the total database size if ttl is kept large.

Combined the two points above, it seems to be better to letter period = ttl. 
And I think it may simplify the implementation with it, because we know current 
database will have all the live data, and previous databases are sure to have 
the old data to be discarded. Thoughts?

4. I assume that {{roll()}} method is going to be processed quickly, right? 
Otherwise, during the transit state of rolling a database, write performance 
will degrade somehow.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times

[jira] [Created] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-07 Thread Craig Welch (JIRA)

Craig Welch created YARN-3463:
-

 Summary: Integrate OrderingPolicy Framework with CapacityScheduler
 Key: YARN-3463
 URL: https://issues.apache.org/jira/browse/YARN-3463
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Craig Welch
Assignee: Craig Welch


Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common

2015-04-07 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484347#comment-14484347
 ] 

Allen Wittenauer commented on YARN-2980:


Or, you could go look at YARN-3375 .

> Move health check script related functionality to hadoop-common
> ---
>
> Key: YARN-2980
> URL: https://issues.apache.org/jira/browse/YARN-2980
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Varun Saxena
> Fix For: 3.0.0
>
> Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
> YARN-2980.003.patch, YARN-2980.004.patch
>
>
> HDFS might want to leverage health check functionality available in YARN in 
> both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
> https://issues.apache.org/jira/browse/HDFS-7441.
> We can move health check functionality including the protocol between hadoop 
> daemons and health check script to hadoop-common. That will simplify the 
> development and maintenance for both hadoop source code and health check 
> script.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2980) Move health check script related functionality to hadoop-common

2015-04-07 Thread Allen Wittenauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-2980.

Resolution: Fixed

> Move health check script related functionality to hadoop-common
> ---
>
> Key: YARN-2980
> URL: https://issues.apache.org/jira/browse/YARN-2980
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Varun Saxena
> Fix For: 3.0.0
>
> Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
> YARN-2980.003.patch, YARN-2980.004.patch
>
>
> HDFS might want to leverage health check functionality available in YARN in 
> both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
> https://issues.apache.org/jira/browse/HDFS-7441.
> We can move health check functionality including the protocol between hadoop 
> daemons and health check script to hadoop-common. That will simplify the 
> development and maintenance for both hadoop source code and health check 
> script.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484346#comment-14484346
 ] 

Li Lu commented on YARN-3459:
-

[~vvasudev] feel free to take it or directly patch it back to YARN-2901. 
Thanks! 

> TestLog4jWarningErrorMetricsAppender breaks in trunk
> 
>
> Key: YARN-3459
> URL: https://issues.apache.org/jira/browse/YARN-3459
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.7.0
>
>
> TestLog4jWarningErrorMetricsAppender fails with the following message:
> {code}
> Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
> Time elapsed: 2.01 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (YARN-2980) Move health check script related functionality to hadoop-common


 [ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du reopened YARN-2980:
--

> Move health check script related functionality to hadoop-common
> ---
>
> Key: YARN-2980
> URL: https://issues.apache.org/jira/browse/YARN-2980
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Varun Saxena
> Fix For: 3.0.0
>
> Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
> YARN-2980.003.patch, YARN-2980.004.patch
>
>
> HDFS might want to leverage health check functionality available in YARN in 
> both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
> https://issues.apache.org/jira/browse/HDFS-7441.
> We can move health check functionality including the protocol between hadoop 
> daemons and health check script to hadoop-common. That will simplify the 
> development and maintenance for both hadoop source code and health check 
> script.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk

2015-04-07 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484335#comment-14484335
 ] 

Wangda Tan commented on YARN-3459:
--

I can reproduce this locally as well, [~vvasudev], do you have any ideas on 
this?

> TestLog4jWarningErrorMetricsAppender breaks in trunk
> 
>
> Key: YARN-3459
> URL: https://issues.apache.org/jira/browse/YARN-3459
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.7.0
>
>
> TestLog4jWarningErrorMetricsAppender fails with the following message:
> {code}
> Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
> Time elapsed: 2.01 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484338#comment-14484338
 ] 

Zhijie Shen commented on YARN-3044:
---

Naga, thanks for the patch. Will take a look at the patch.

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2980) Move health check script related functionality to hadoop-common


[ 
https://issues.apache.org/jira/browse/YARN-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484334#comment-14484334
 ] 

Junping Du commented on YARN-2980:
--

What does "Abey khali" means?
{code}
+if(!NodeHealthScriptRunner.shouldRun(nodeHealthScript)) {
+  LOG.info("Abey khali");
+  return null;
+}
{code}
If meaningless, I will reopen this JIRA until we have a fix.

> Move health check script related functionality to hadoop-common
> ---
>
> Key: YARN-2980
> URL: https://issues.apache.org/jira/browse/YARN-2980
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ming Ma
>Assignee: Varun Saxena
> Fix For: 3.0.0
>
> Attachments: YARN-2980.001.patch, YARN-2980.002.patch, 
> YARN-2980.003.patch, YARN-2980.004.patch
>
>
> HDFS might want to leverage health check functionality available in YARN in 
> both namenode https://issues.apache.org/jira/browse/HDFS-7400 and datanode 
> https://issues.apache.org/jira/browse/HDFS-7441.
> We can move health check functionality including the protocol between hadoop 
> daemons and health check script to hadoop-common. That will simplify the 
> development and maintenance for both hadoop source code and health check 
> script.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3429) TestAMRMTokens.testTokenExpiry fails Intermittently with error message:Invalid AMRMToken


[ 
https://issues.apache.org/jira/browse/YARN-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484320#comment-14484320
 ] 

Hudson commented on YARN-3429:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7525 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7525/])
YARN-3429. Fix incorrect CHANGES.txt (rkanter: rev 
5b8a3ae366294aec492f69f1a429aa7fce5d13be)
* hadoop-yarn-project/CHANGES.txt


> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken
> 
>
> Key: YARN-3429
> URL: https://issues.apache.org/jira/browse/YARN-3429
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: zhihai xu
>Assignee: zhihai xu
> Fix For: 2.8.0
>
> Attachments: YARN-3429.000.patch
>
>
> TestAMRMTokens.testTokenExpiry fails Intermittently with error 
> message:Invalid AMRMToken from appattempt_1427804754787_0001_01
> The error logs is at 
> https://builds.apache.org/job/PreCommit-YARN-Build/7172//testReport/org.apache.hadoop.yarn.server.resourcemanager.security/TestAMRMTokens/testTokenExpiry_1_/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2


 [ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3462:
---

Assignee: Naganarasimha G R

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3326) ReST support for getLabelsToNodes


 [ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3326:

Attachment: YARN-3326.20150407-1.patch

Have modified the patch assuming that "/label-mappings" is good enough ... 
please feedback !

> ReST support for getLabelsToNodes 
> --
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484295#comment-14484295
 ] 

Jonathan Eagles commented on YARN-3448:
---

[~jlowe], addressed you comments. One thing to note is that the entity based 
read write lock I didn't add back in. Theoretically this is possible for 
applications that have multiple writers to update to both get and set the start 
times for an entity or related entity. For applications like Tez (one writer) 
this is not possible AFAIK. It probably isn't a huge over head, I just haven't 
had the time to benchmark before and after with entity locking.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.2.patch

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484265#comment-14484265
 ] 

Naganarasimha G R commented on YARN-3044:
-

Hi [~zjshen], [~djp] & [~sjlee0],
Can anyone of you have a look at my last patch ?

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-07 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484201#comment-14484201
 ] 

Daryn Sharp commented on YARN-3055:
---

This appears to go back to the really old days of renewing the token for its 
entire lifetime.  Most unfortunate.

The renewer looks like it may turn into a DOS weapon.  Renewing a token returns 
the next expiration.  The renewer uses a timer to renew 90% before expiration.  
After the last renewal, the same expiration ("the wall") will be returned as 
before.  90% of "the wall" eventually becomes a rapid fire renewal.  There's an 
army of 50 threads prepared to fire concurrently.

My other concern is that it used to be the first job submitted with a given 
token that determined if the token is to be cancelled.  Now any job can 
influence the cancelling.  This patch didn't specifically break that behavior, 
but the original YARN-2704 did, which precipitated YARN-2964 to break it 
differently, and now this jira.

The ramification is we used to tell users to make sure the first job set the 
conf correctly, and essentially don't worry after that.  Now they do have to 
worry.  Any sub-job with the default of canceling tokens will kill the overall 
workflow.  Sub-jobs should not have jurisdiction over the tokens.

> The token is not renewed properly if it's shared by jobs (oozie) in 
> DelegationTokenRenewer
> --
>
> Key: YARN-3055
> URL: https://issues.apache.org/jira/browse/YARN-3055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Blocker
> Attachments: YARN-3055.001.patch, YARN-3055.002.patch
>
>
> After YARN-2964, there is only one timer to renew the token if it's shared by 
> jobs. 
> In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
> token is shared by other jobs, we will not cancel the token. 
> Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
> from {{allTokens}}. Otherwise for the existing submitted applications which 
> share this token will not get renew any more, and for new submitted 
> applications which share this token, the token will be renew immediately.
> For example, we have 3 applications: app1, app2, app3. And they share the 
> token1. See following scenario:
> *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
> there is only one token renewal timer for token1, and is scheduled when app1 
> is submitted
> *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
> be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer

2015-04-07 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484117#comment-14484117
 ] 

Daryn Sharp commented on YARN-3055:
---

On cursory glance, are you sure this isn't going to leak tokens?  Ie. does it 
remove tokens from data structures in all cases or can a token get left in 
allTokens?

> The token is not renewed properly if it's shared by jobs (oozie) in 
> DelegationTokenRenewer
> --
>
> Key: YARN-3055
> URL: https://issues.apache.org/jira/browse/YARN-3055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Blocker
> Attachments: YARN-3055.001.patch, YARN-3055.002.patch
>
>
> After YARN-2964, there is only one timer to renew the token if it's shared by 
> jobs. 
> In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
> token is shared by other jobs, we will not cancel the token. 
> Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
> from {{allTokens}}. Otherwise for the existing submitted applications which 
> share this token will not get renew any more, and for new submitted 
> applications which share this token, the token will be renew immediately.
> For example, we have 3 applications: app1, app2, app3. And they share the 
> token1. See following scenario:
> *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
> there is only one token renewal timer for token1, and is scheduled when app1 
> is submitted
> *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
> be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3055) The token is not renewed properly if it's shared by jobs (oozie) in DelegationTokenRenewer


[ 
https://issues.apache.org/jira/browse/YARN-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484098#comment-14484098
 ] 

Vinod Kumar Vavilapalli commented on YARN-3055:
---

[~daryn]/[~jianhe], I briefly looked at the existing patch on this JIRA and it 
seems like it will work. Can you also take a look?

[~hitliuyi], can you see if you can add a test for this in 
TestDelegationTokenRenewer.java?

This is the last blocker on 2.7.0 as of today. Appreciate all the help I can 
get, thanks all.

> The token is not renewed properly if it's shared by jobs (oozie) in 
> DelegationTokenRenewer
> --
>
> Key: YARN-3055
> URL: https://issues.apache.org/jira/browse/YARN-3055
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Blocker
> Attachments: YARN-3055.001.patch, YARN-3055.002.patch
>
>
> After YARN-2964, there is only one timer to renew the token if it's shared by 
> jobs. 
> In {{removeApplicationFromRenewal}}, when going to remove a token, and the 
> token is shared by other jobs, we will not cancel the token. 
> Meanwhile, we should not cancel the _timerTask_, also we should not remove it 
> from {{allTokens}}. Otherwise for the existing submitted applications which 
> share this token will not get renew any more, and for new submitted 
> applications which share this token, the token will be renew immediately.
> For example, we have 3 applications: app1, app2, app3. And they share the 
> token1. See following scenario:
> *1).* app1 is submitted firstly, then app2, and then app3. In this case, 
> there is only one token renewal timer for token1, and is scheduled when app1 
> is submitted
> *2).* app1 is finished, then the renewal timer is cancelled. token1 will not 
> be renewed any more, but app2 and app3 still use it, so there is problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2424) LCE should support non-cgroups, non-secure mode


[ 
https://issues.apache.org/jira/browse/YARN-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484058#comment-14484058
 ] 

Sidharta Seethana commented on YARN-2424:
-

Here it is : https://issues.apache.org/jira/browse/YARN-3462

> LCE should support non-cgroups, non-secure mode
> ---
>
> Key: YARN-2424
> URL: https://issues.apache.org/jira/browse/YARN-2424
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.3.0, 2.4.0, 2.5.0, 2.4.1
>Reporter: Allen Wittenauer
>Assignee: Allen Wittenauer
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: Y2424-1.patch, YARN-2424.patch
>
>
> After YARN-1253, LCE no longer works for non-secure, non-cgroup scenarios.  
> This is a fairly serious regression, as turning on LCE prior to turning on 
> full-blown security is a fairly standard procedure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2

Sidharta Seethana created YARN-3462:
---

 Summary: Patches applied for YARN-2424 are inconsistent between 
trunk and branch-2
 Key: YARN-3462
 URL: https://issues.apache.org/jira/browse/YARN-3462
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sidharta Seethana


It looks like the changes for YARN-2424 are not the same for trunk (commit 
7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning and 
documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3426) Add jdiff support to YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3426:
--
Target Version/s: 2.8.0  (was: 2.7.0)

bq. The bigger question is the duplication of the maven code across Common, 
YARN and MAPREDUCE. But this may take more time to cleanup.
Removing it from 2.7.0 as the effort needed for this cleanup is huge.


> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
> YARN-3426-040715.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484051#comment-14484051
 ] 

Vinod Kumar Vavilapalli commented on YARN-3426:
---

Comments on the patch
 - The API links are broken. Changing it to apidocs works. 
 - Many private APIs are showing up in javadoc. For e.g., 
ContainerReport.newInstance is private but it shows up in jdiff and API docs.

The bigger question is the duplication of the maven code across Common, YARN 
and MAPREDUCE. But this may take more time to cleanup. I'll use the output from 
this patch to figure out compatibility issues with 2.7.0, but remove this patch 
itself from 2.7.0. 



> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
> YARN-3426-040715.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities


[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484032#comment-14484032
 ] 

Zhijie Shen commented on YARN-3448:
---

Jonathan, thanks for your contribution. It sounds an interesting proposal. I'd 
like to take a look at the patch too.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2901) Add errors and warning metrics page to RM, NM web UI


[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484030#comment-14484030
 ] 

Sidharta Seethana commented on YARN-2901:
-

IMHO, using calls to LOG.error()/LOG.warn() as proxies for counting 
errors/warnings is flaky at best. It places cumbersome restrictions on code 
requiring that a given error/warning correspond to a single error()/warn() 
call. This is tough to enforce even within a single block of code, let alone 
across multiple functions ( e.g when an exception is thrown/re-thrown and an 
error/warning logged in multiple locations ). I hope this will not lead to a 
restriction on new code in YARN that error/warning should corresponding a 
single error()/warn() call. 

> Add errors and warning metrics page to RM, NM web UI
> 
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, YARN-2901.addendem.1.patch, 
> apache-yarn-2901.0.patch, apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, 
> apache-yarn-2901.3.patch, apache-yarn-2901.4.patch, apache-yarn-2901.5.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484008#comment-14484008
 ] 

Li Lu commented on YARN-3426:
-

Could not reproduce the mvn eclipse:eclipse failure locally. The failure looks 
to be irrelevant. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
> YARN-3426-040715.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483874#comment-14483874
 ] 

Inigo Goiri commented on YARN-3458:
---

For the tests, I checked the original TestWindowsBasedProcessTree and it didn't 
have related to actually testing the resource monitoring; I'm open to 
suggestions.

Regarding the two warning, I'm not able to understand what this is complaining 
about; it says that I have fields not accessed but the ones I added are 
referenced. I think ti refers to Log but I'm not able to parse the error.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-07 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-3439.
--
Resolution: Duplicate

bq. IAC, this is a dup of YARN-3055.

Agreed, closing as a duplicate.

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: YARN-3439.001.patch
>
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3452) Bogus token usernames cause many invalid group lookups

2015-04-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483861#comment-14483861
 ] 

Jason Lowe commented on YARN-3452:
--

The extra lookups started in 2.6 releases, and it appears to be caused by 
HADOOP-10650.  However YARN really should not be using bogus users on tokens 
anyway in case the RPC layer (or other non-YARN systems) try to do something 
with those users like HADOOP-10650 did.

> Bogus token usernames cause many invalid group lookups
> --
>
> Key: YARN-3452
> URL: https://issues.apache.org/jira/browse/YARN-3452
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Reporter: Jason Lowe
>
> YARN uses a number of bogus usernames for tokens, like application attempt 
> IDs for NM tokens or even the hardcoded "testing" for the container localizer 
> token.  These tokens cause the RPC layer to do group lookups on these bogus 
> usernames which will never succeed but can take a long time to perform.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


 [ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sidharta Seethana updated YARN-3366:

Attachment: YARN-3366.003.patch

Uploading patch incorporating code review feedback.

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch, 
> YARN-3366.003.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483837#comment-14483837
 ] 

Hadoop QA commented on YARN-3426:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723684/YARN-3426-040715.patch
  against trunk revision d27e924.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7247//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7247//console

This message is automatically generated.

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
> YARN-3426-040715.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM


[ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483827#comment-14483827
 ] 

Hadoop QA commented on YARN-3460:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723668/HADOOP-11810-1.patch
  against trunk revision d27e924.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1148 javac 
compiler warnings (more than the trunk's current 209 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7245//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7245//console

This message is automatically generated.

> Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
> 
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 3.0.0, 2.6.0
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
> Attachments: HADOOP-11810-1.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483822#comment-14483822
 ] 

Hadoop QA commented on YARN-3458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723681/YARN-3458-3.patch
  against trunk revision d27e924.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7246//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7246//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7246//console

This message is automatically generated.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483812#comment-14483812
 ] 

Zhijie Shen commented on YARN-3391:
---

bq. let's continue the discussion on a separated JIRA for figuring it out later.

Agree. Let's unblock this Jira which will unblock the writer implementation 
consequently. I filed YARN-3461 to continue the defaults discussion there. 

bq. I just wanted to add my 2 cents that this is something we already see and 
experience with hRaven so it's not theoretical.

Sangjin, thanks for sharing the use case in hRaven. It's helpful to understand 
the proper defaults. To generalize it, we need to consider different use cases 
such as adhoc applications only. Shall we continue the discussion on YARN-3461?

bq. As I mentioned earlier, it should be useful for developers

I make use of Sangjin's previous comments to add some inline code comments 
about their definitions in TimelineCollectorContext.

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels


[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483808#comment-14483808
 ] 

Vinod Kumar Vavilapalli commented on YARN-3361:
---

Review of the tests
 - testNonExclusiveNodeLabelsAllocationIgnoreAppSubmitOrder
  --  -> testPreferenceOfNeedyAppsTowardsNodePartitions ?
  -- This doesn't really guarantee if app2 is getting preference or not. How 
about changing it to say app2 has enough requests to fill the entire node?
 - testNonExclusiveNodeLabelsAllocationIgnorePriority
  -- -> testPreferenceOfNeedyContainersTowardsNodePartitions ?
  -- Actually, now that I rename it that way, this may not be the right 
behavior. Not respecting priorities within an app can result in scheduling 
deadlocks.
 - testLabeledResourceRequestsGetPreferrenceInHierarchyOfQueue: This is really 
testQueuesWithAccessGetPreferrenceInPartitionedNodes?
 - testNonLabeledQueueUsesLabeledResource
  -- -> testQueuesWithoutAccessUsingPartitionedNodes
  -- Also validate that the wait for non-labeled requests not getting allocated 
on non-partitioned nodes is only for one cycle through all nodes in the cluster
 - Let's move all these node-label related tests into their own test-case.
 - More tests?
  -- AMs with labeled requirement not getting allocated on non-exclusive 
partitions
  -- To verify that we are not putting absolute max-capacities on the 
individual queues when not-respecting-partitions


> CapacityScheduler side changes to support non-exclusive node labels
> ---
>
> Key: YARN-3361
> URL: https://issues.apache.org/jira/browse/YARN-3361
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3361.1.patch, YARN-3361.2.patch
>
>
> According to design doc attached in YARN-3214, we need implement following 
> logic in CapacityScheduler:
> 1) When allocate a resource request with no node-label specified, it should 
> get preferentially allocated to node without labels.
> 2) When there're some available resource in a node with label, they can be 
> used by applications with following order:
> - Applications under queues which can access the label and ask for same 
> labeled resource. 
> - Applications under queues which can access the label and ask for 
> non-labeled resource.
> - Applications under queues cannot access the label and ask for non-labeled 
> resource.
> 3) Expose necessary information that can be used by preemption policy to make 
> preemption decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3461) Consolidate flow name/version/run defaults

Zhijie Shen created YARN-3461:
-

 Summary: Consolidate flow name/version/run defaults
 Key: YARN-3461
 URL: https://issues.apache.org/jira/browse/YARN-3461
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-3391, it's not resolved what should be the defaults for flow 
name/version/run. Let's continue the discussion here and unblock YARN-3391 from 
moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk


[ 
https://issues.apache.org/jira/browse/YARN-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483779#comment-14483779
 ] 

Li Lu commented on YARN-3459:
-

Reproduced this failure on my local machine as well as Jenkins run for 
YARN-3426. Seems like the test failure was introduced by YARN-2901. 
[~wangda][~vvasudev] can anyone of you take a look at it? Thanks! 

> TestLog4jWarningErrorMetricsAppender breaks in trunk
> 
>
> Key: YARN-3459
> URL: https://issues.apache.org/jira/browse/YARN-3459
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.7.0
>
>
> TestLog4jWarningErrorMetricsAppender fails with the following message:
> {code}
> Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< 
> FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
> testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
> Time elapsed: 2.01 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


 [ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-3391:
--
Attachment: YARN-3391.3.patch

> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch, YARN-3391.3.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3426) Add jdiff support to YARN


 [ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3426:

Attachment: YARN-3426-040715.patch

Added license information to the four .xml API files. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch, 
> YARN-3426-040715.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483714#comment-14483714
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
> 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
> 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483716#comment-14483716
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2106 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2106/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java
* hadoop-yarn-project/CHANGES.txt


> LCE should blacklist based upon group
> -
>
> Key: YARN-2429
> URL: https://issues.apache.org/jira/browse/YARN-2429
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Allen Wittenauer
>
> It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-3.patch

Git and I are getting through a rough relation, let's see if now...

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes

2015-04-07 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483698#comment-14483698
 ] 

Jason Lowe commented on YARN-3439:
--

I believe it is setting that to false, as that behavior hasn't changed on the 
Oozie side.  However this isn't an issue of the token being cancelled but 
rather expiring.  The RM properly avoids cancelling the token when the launcher 
job exits, but it then forgets to keep renewing it as well.  Eventually the 
token expires and downstream jobs fail (if they run long enough).

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: YARN-3439.001.patch
>
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels


[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483693#comment-14483693
 ] 

Jian He commented on YARN-3361:
---

Some comments on my side
- should treat each limit differently for different labeled requests?  
{code}
// Otherwise, if any of the label of this node beyond queue limit, we // cannot 
allocate on this node. Consider a small epsilon here.
{code}
- Merge queue#needResource and application#needResource
- needResource -> hasPendingResourceRequest; needResource can also be 
simplified if pass in partionToAllocate 
- Some methods like canAssignToThisQueue where both nodeLabels and 
exclusiveType are passed, it may be simplified by passing the current 
partitionToAllocate to simplify the internal if/else check.
- The following may be incorrect, as the current request may be not the AM 
container request, though null == rmAppAttempt.getMasterContainer()
{code} // AM container allocation doesn't support non-exclusive allocation to 
// avoid painful of preempt an AM container if 
{code}

- below if/else can be avoided if passing the nodePartition into 
queueCapacities.getAbsoluteCapacity(nodePartition),
{code}
if (!nodePartition.equals(RMNodeLabelsManager.NO_LABEL)) {
  queueCapacity =
  Resources
  .max(resourceCalculator, clusterResource, queueCapacity,

  Resources.multiplyAndNormalizeUp(
  resourceCalculator,
  labelManager.getResourceByLabel(nodePartition,
  clusterResource),
  queueCapacities.getAbsoluteCapacity(nodePartition),
  minimumAllocation));
} else {
  // else there's no label on request, just to use absolute capacity as
  // capacity for nodes without label
  queueCapacity =
  Resources.multiplyAndNormalizeUp(resourceCalculator, labelManager
.getResourceByLabel(CommonNodeLabelsManager.NO_LABEL, 
clusterResource),
  queueCapacities.getAbsoluteCapacity(),
  minimumAllocation);
}
{code}
- the second limit won’t be hit?
{code}
if (exclusiveType == ExclusiveType.EXCLUSIVE) {
  maxUserLimit =
  Resources.multiplyAndRoundDown(queueCapacity, userLimitFactor);
} else if (exclusiveType == ExclusiveType.NON_EXECLUSIVE) {
  maxUserLimit =
  labelManager.getResourceByLabel(nodePartition, clusterResource);
}
{code}
- nonExclusiveSchedulingOpportunities#setCount -> add(Priority)


> CapacityScheduler side changes to support non-exclusive node labels
> ---
>
> Key: YARN-3361
> URL: https://issues.apache.org/jira/browse/YARN-3361
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3361.1.patch, YARN-3361.2.patch
>
>
> According to design doc attached in YARN-3214, we need implement following 
> logic in CapacityScheduler:
> 1) When allocate a resource request with no node-label specified, it should 
> get preferentially allocated to node without labels.
> 2) When there're some available resource in a node with label, they can be 
> used by applications with following order:
> - Applications under queues which can access the label and ask for same 
> labeled resource. 
> - Applications under queues which can access the label and ask for 
> non-labeled resource.
> - Applications under queues cannot access the label and ask for non-labeled 
> resource.
> 3) Expose necessary information that can be used by preemption policy to make 
> preemption decisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483692#comment-14483692
 ] 

Hadoop QA commented on YARN-3458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723677/YARN-3458-2.patch
  against trunk revision d27e924.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7244//console

This message is automatically generated.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (YARN-3460) Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM

2015-04-07 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran moved HADOOP-11810 to YARN-3460:
---

Fix Version/s: (was: 3.0.0)
 Target Version/s: 2.8.0  (was: 2.6.0)
Affects Version/s: (was: 2.6.0)
   (was: 3.0.0)
   3.0.0
   2.6.0
  Key: YARN-3460  (was: HADOOP-11810)
  Project: Hadoop YARN  (was: Hadoop Common)

> Test TestSecureRMRegistryOperations failed with IBM_JAVA JVM
> 
>
> Key: YARN-3460
> URL: https://issues.apache.org/jira/browse/YARN-3460
> Project: Hadoop YARN
>  Issue Type: Test
>Affects Versions: 2.6.0, 3.0.0
> Environment: $ mvn -version
> Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 
> 2014-02-14T11:37:52-06:00)
> Maven home: /opt/apache-maven-3.2.1
> Java version: 1.7.0, vendor: IBM Corporation
> Java home: /usr/lib/jvm/ibm-java-ppc64le-71/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "linux", version: "3.10.0-229.ael7b.ppc64le", arch: "ppc64le", 
> family: "unix"
>Reporter: pascal oliva
> Attachments: HADOOP-11810-1.patch
>
>
> TestSecureRMRegistryOperations failed with JBM IBM JAVA
> mvn test -X 
> -Dtest=org.apache.hadoop.registry.secure.TestSecureRMRegistryOperations
> ModuleTotal Failure Error Skipped
> -
> hadoop-yarn-registry 12  0   12 0
> -
>  Total  12  0   12 0
> With 
> javax.security.auth.login.LoginException: Bad JAAS configuration: 
> unrecognized option: isInitiator
> and 
> Bad JAAS configuration: unrecognized option: storeKey



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3439) RM fails to renew token when Oozie launcher leaves before sub-job finishes


[ 
https://issues.apache.org/jira/browse/YARN-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483685#comment-14483685
 ] 

Jian He commented on YARN-3439:
---

IIUC, isn't this a long-standing issue that Ozzie doesn't set 
"mapreduce.job.complete.cancel.delegation.tokens" to false for standard MR Job 
? according to [here | 
https://issues.apache.org/jira/browse/YARN-2964?focusedCommentId=14250926&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14250926].
 Should we set it to false on Ozzie side ?

> RM fails to renew token when Oozie launcher leaves before sub-job finishes
> --
>
> Key: YARN-3439
> URL: https://issues.apache.org/jira/browse/YARN-3439
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
>Reporter: Jason Lowe
>Assignee: Daryn Sharp
>Priority: Blocker
> Attachments: YARN-3439.001.patch
>
>
> When the Oozie launcher runs a standard MapReduce job (not Pig) it doesn't 
> linger waiting for the sub-job to finish.  At that point the RM stops 
> renewing delegation tokens for the launcher job which wreaks havoc on the 
> sub-job if the sub-job runs long enough for the tokens to expire.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3459) TestLog4jWarningErrorMetricsAppender breaks in trunk

Li Lu created YARN-3459:
---

 Summary: TestLog4jWarningErrorMetricsAppender breaks in trunk
 Key: YARN-3459
 URL: https://issues.apache.org/jira/browse/YARN-3459
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Li Lu
Assignee: Li Lu
Priority: Blocker
 Fix For: 2.7.0


TestLog4jWarningErrorMetricsAppender fails with the following message:
{code}
Running org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.214 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender
testPurge(org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender)  
Time elapsed: 2.01 sec  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.util.TestLog4jWarningErrorMetricsAppender.testPurge(TestLog4jWarningErrorMetricsAppender.java:89)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3426) Add jdiff support to YARN


[ 
https://issues.apache.org/jira/browse/YARN-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483681#comment-14483681
 ] 

Li Lu commented on YARN-3426:
-

The failed unit test also breaks in trunk. Will file a blocker on this. 

> Add jdiff support to YARN
> -
>
> Key: YARN-3426
> URL: https://issues.apache.org/jira/browse/YARN-3426
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: YARN-3426-040615-1.patch, YARN-3426-040615.patch
>
>
> Maybe we'd like to extend our current jdiff tool for hadoop-common and hdfs 
> to YARN as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483660#comment-14483660
 ] 

Hadoop QA commented on YARN-1376:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723661/YARN-1376.2015-04-07.patch
  against trunk revision 0b5d7d2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7241//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7241//console

This message is automatically generated.

> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
> YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
> YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-2.patch

Patch based on trunk. Let's see if Jenkins likes it.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2429) LCE should blacklist based upon group


[ 
https://issues.apache.org/jira/browse/YARN-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483666#comment-14483666
 ] 

Hudson commented on YARN-2429:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/])
YARN-2429. TestAMRMTokens.testTokenExpiry fails Intermittently with error 
message:Invalid AMRMToken (zxu via rkanter) (rkanter: rev 
99b08a748e7b00a58b63330b353902a6da6aae27)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


> LCE should blacklist based upon group
> -
>
> Key: YARN-2429
> URL: https://issues.apache.org/jira/browse/YARN-2429
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Allen Wittenauer
>
> It should be possible to list a group to ban, not just individual users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI


[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483668#comment-14483668
 ] 

Hadoop QA commented on YARN-3293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723665/apache-yarn-3293.5.patch
  against trunk revision 0b5d7d2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7242//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7242//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7242//console

This message is automatically generated.

> Track and display capacity scheduler health metrics in web UI
> -
>
> Key: YARN-3293
> URL: https://issues.apache.org/jira/browse/YARN-3293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
> apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
> apache-yarn-3293.4.patch, apache-yarn-3293.5.patch
>
>
> It would be good to display metrics that let users know about the health of 
> the capacity scheduler in the web UI. Today it is hard to get an idea if the 
> capacity scheduler is functioning correctly. Metrics such as the time for the 
> last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3273) Improve web UI to facilitate scheduling analysis and debugging


[ 
https://issues.apache.org/jira/browse/YARN-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483664#comment-14483664
 ] 

Hudson commented on YARN-3273:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #157 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/157/])
Move YARN-3273 from 2.8 to 2.7. (zjshen: rev 
3fb5abfc87953377f86e06578518801a181d7697)
* hadoop-yarn-project/CHANGES.txt


> Improve web UI to facilitate scheduling analysis and debugging
> --
>
> Key: YARN-3273
> URL: https://issues.apache.org/jira/browse/YARN-3273
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Rohith
> Fix For: 2.7.0
>
> Attachments: 0001-YARN-3273-v1.patch, 0001-YARN-3273-v2.patch, 
> 0002-YARN-3273.patch, 0003-YARN-3273.patch, 0003-YARN-3273.patch, 
> 0004-YARN-3273.patch, YARN-3273-am-resource-used-AND-User-limit-v2.PNG, 
> YARN-3273-am-resource-used-AND-User-limit.PNG, 
> YARN-3273-application-headroom-v2.PNG, YARN-3273-application-headroom.PNG
>
>
> Job may be stuck for reasons such as:
> - hitting queue capacity 
> - hitting user-limit, 
> - hitting AM-resource-percentage 
> The  first queueCapacity is already shown on the UI.
> We may surface things like:
> - what is user's current usage and user-limit; 
> - what is the AM resource usage and limit;
> - what is the application's current HeadRoom;
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Labels: containers metrics windows  (was: )

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483631#comment-14483631
 ] 

Hadoop QA commented on YARN-3458:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12723671/YARN-3458-1.patch
  against trunk revision d27e924.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7243//console

This message is automatically generated.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
>  Labels: containers, metrics, windows
> Attachments: YARN-3458-1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows


[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483627#comment-14483627
 ] 

Inigo Goiri commented on YARN-3458:
---

Not sure if the patch has been created properly as I'm in between a couple 
versions.
I would created based on trunk if this doesn't work.

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
> Attachments: YARN-3458-1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3458) CPU resource monitoring in Windows


 [ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated YARN-3458:
--
Attachment: YARN-3458-1.patch

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Priority: Minor
> Attachments: YARN-3458-1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3458) CPU resource monitoring in Windows

Inigo Goiri created YARN-3458:
-

 Summary: CPU resource monitoring in Windows
 Key: YARN-3458
 URL: https://issues.apache.org/jira/browse/YARN-3458
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.7.0
 Environment: Windows
Reporter: Inigo Goiri
Priority: Minor


The current implementation of getCpuUsagePercent() for WindowsBasedProcessTree 
is left as unavailable. Attached a proposal of how to do it. I reused the 
CpuTimeTracker using 1 jiffy=1ms.

This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3366) Outbound network bandwidth : classify/shape traffic originating from YARN containers


[ 
https://issues.apache.org/jira/browse/YARN-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483605#comment-14483605
 ] 

Sidharta Seethana commented on YARN-3366:
-

Thanks for the review, [~vvasudev] . Responses inline :

1. I'll fix this. This is an artifact of differences between trunk/branch-2
(repeated) 1. I think these are useful log lines that specify change in 
behavior due to settings/system state etc. I'll clarify/improve the log 
messages.
2. good catch, I'll fix it. Tests ran fine because WARN logging was enabled.
3. I'll fix the comments' location. The exception used to exist before but was 
causing bootstrapping issues. I left it in there along with an explanation for 
why it shouldn't be thrown. I'll remove it and modify comments.
4. Intellij warns me about this too - but I had left it in there for 
clarity/consistency with the earlier code block - I believe it makes the code a 
bit more readable. I would prefer to leave it in place.
5. I'll fix this
6. I'll fix this
7. why? compiler optimization? 
8. I'll fix this. 
9. I'll fix this.
10. I'll fix this.
11. I'll fix this - though I don't believe the merging always helps for 
error/warn metrics
12.  I'll fix this.
13. Not trivially, would refactoring launchContainer. 

> Outbound network bandwidth : classify/shape traffic originating from YARN 
> containers
> 
>
> Key: YARN-3366
> URL: https://issues.apache.org/jira/browse/YARN-3366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3366.001.patch, YARN-3366.002.patch
>
>
> In order to be able to isolate based on/enforce outbound traffic bandwidth 
> limits, we need  a mechanism to classify/shape network traffic in the 
> nodemanager. For more information on the design, please see the attached 
> design document in the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3046) [Event producers] Implement MapReduce AM writing some MR metrics to ATS


[ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483588#comment-14483588
 ] 

Junping Du commented on YARN-3046:
--

Linked with MAPREDUCE-6189 - the test failure on trunk is solid, not only on my 
local test bed.

> [Event producers] Implement MapReduce AM writing some MR metrics to ATS
> ---
>
> Key: YARN-3046
> URL: https://issues.apache.org/jira/browse/YARN-3046
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch
>
>
> Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
> written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483518#comment-14483518
 ] 

Hudson commented on YARN-3294:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7521 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7521/])
YARN-3294. Allow dumping of Capacity Scheduler debug logs via web UI for 
(xgong: rev d27e9241e8676a0edb2d35453cac5f9495fcd605)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestAdHocLogDumper.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/AdHocLogDumper.java


> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch, apache-yarn-3294.4.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage


[ 
https://issues.apache.org/jira/browse/YARN-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483498#comment-14483498
 ] 

Junping Du commented on YARN-3391:
--

Sorry for coming a little late. Thanks guys for good discussions here and 
[~zjshen] for updating the patch!
bq. I just wanted to add my 2 cents that this is something we already see and 
experience with hRaven so it's not theoretical.
+1, [~sjlee0]! I think that's very important feedback for improving user 
experience for new feature here. Let's try to get a good balance between 
addressing these solid scenarios as well as providing flexibility to possible 
new scenarios. e.g. we can provide different flow group policies that user can 
use to group application into flow by name or keeping them as isolated flow, 
etc. Anyway, as everyone's agreement so far, let's continue the discussion on a 
separated JIRA for figuring it out later. 

The patch looks good in overall. However, I still haven't seen we put 
definition of "flow", "flow run" and "flow version" in any places of Javadoc. 
As I mentioned earlier, it should be useful for developers. The official Apache 
feature doc is more user oriented and we can address it later when feature get 
completed. 



> Clearly define flow ID/ flow run / flow version in API and storage
> --
>
> Key: YARN-3391
> URL: https://issues.apache.org/jira/browse/YARN-3391
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3391.1.patch, YARN-3391.2.patch
>
>
> To continue the discussion in YARN-3040, let's figure out the best way to 
> describe the flow.
> Some key issues that we need to conclude on:
> - How do we include the flow version in the context so that it gets passed 
> into the collector and to the storage eventually?
> - Flow run id should be a number as opposed to a generic string?
> - Default behavior for the flow run id if it is missing (i.e. client did not 
> set it)
> - How do we handle flow attributes in case of nested levels of flows?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3443) Create a 'ResourceHandler' subsystem to ease addition of support for new resource types on the NM

2015-04-07 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483489#comment-14483489
 ] 

Varun Vasudev commented on YARN-3443:
-

+1, lgtm for the latest patch.

> Create a 'ResourceHandler' subsystem to ease addition of support for new 
> resource types on the NM
> -
>
> Key: YARN-3443
> URL: https://issues.apache.org/jira/browse/YARN-3443
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
> Attachments: YARN-3443.001.patch, YARN-3443.002.patch, 
> YARN-3443.003.patch, YARN-3443.004.patch
>
>
> The current cgroups implementation is closely tied to supporting CPU as a 
> resource . We need to separate out CGroups support as well a provide a simple 
> ResourceHandler subsystem that will enable us to add support for new resource 
> types on the NM - e.g Network, Disk etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483490#comment-14483490
 ] 

Xuan Gong commented on YARN-3294:
-

Committed into trunk/branch-2. Thanks, varun.

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch, apache-yarn-3294.4.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3294) Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time period


[ 
https://issues.apache.org/jira/browse/YARN-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483486#comment-14483486
 ] 

Xuan Gong commented on YARN-3294:
-

+1 lgtm. Will commit

> Allow dumping of Capacity Scheduler debug logs via web UI for a fixed time 
> period
> -
>
> Key: YARN-3294
> URL: https://issues.apache.org/jira/browse/YARN-3294
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-12 at 8.51.25 PM.png, 
> apache-yarn-3294.0.patch, apache-yarn-3294.1.patch, apache-yarn-3294.2.patch, 
> apache-yarn-3294.3.patch, apache-yarn-3294.4.patch
>
>
> It would be nice to have a button on the web UI that would allow dumping of 
> debug logs for just the capacity scheduler for a fixed period of time(1 min, 
> 5 min or so) in a separate log file. It would be useful when debugging 
> scheduler behavior without affecting the rest of the resourcemanager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3293) Track and display capacity scheduler health metrics in web UI

2015-04-07 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3293:

Attachment: apache-yarn-3293.5.patch

The findbug warnings are incorrect - the fields are used by JAXB. Updated patch 
to exclude them. The failing test is unrelated.

> Track and display capacity scheduler health metrics in web UI
> -
>
> Key: YARN-3293
> URL: https://issues.apache.org/jira/browse/YARN-3293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
> apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
> apache-yarn-3293.4.patch, apache-yarn-3293.5.patch
>
>
> It would be good to display metrics that let users know about the health of 
> the capacity scheduler in the web UI. Today it is hard to get an idea if the 
> capacity scheduler is functioning correctly. Metrics such as the time for the 
> last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


 [ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1376:

Attachment: YARN-1376.2015-04-07.patch

Address all the latest comments.

> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
> YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
> YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1376) NM need to notify the log aggregation status to RM through Node heartbeat


 [ 
https://issues.apache.org/jira/browse/YARN-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1376:

Attachment: Screen Shot 2015-04-07 at 9.30.42 AM.png

> NM need to notify the log aggregation status to RM through Node heartbeat
> -
>
> Key: YARN-1376
> URL: https://issues.apache.org/jira/browse/YARN-1376
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: Screen Shot 2015-04-07 at 9.30.42 AM.png, 
> YARN-1376.1.patch, YARN-1376.2.patch, YARN-1376.2.patch, 
> YARN-1376.2015-04-04.patch, YARN-1376.2015-04-06.patch, 
> YARN-1376.2015-04-07.patch, YARN-1376.3.patch, YARN-1376.4.patch
>
>
> Expose a client API to allow clients to figure if log aggregation is 
> complete. The ticket is used to track the changes on NM side



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3293) Track and display capacity scheduler health metrics in web UI


[ 
https://issues.apache.org/jira/browse/YARN-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483445#comment-14483445
 ] 

Hadoop QA commented on YARN-3293:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12723649/apache-yarn-3293.4.patch
  against trunk revision 75c5454.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7239//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7239//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7239//console

This message is automatically generated.

> Track and display capacity scheduler health metrics in web UI
> -
>
> Key: YARN-3293
> URL: https://issues.apache.org/jira/browse/YARN-3293
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: Screen Shot 2015-03-30 at 4.30.14 PM.png, 
> apache-yarn-3293.0.patch, apache-yarn-3293.1.patch, apache-yarn-3293.2.patch, 
> apache-yarn-3293.4.patch
>
>
> It would be good to display metrics that let users know about the health of 
> the capacity scheduler in the web UI. Today it is hard to get an idea if the 
> capacity scheduler is functioning correctly. Metrics such as the time for the 
> last allocation, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.