[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161547#comment-14161547
 ] 

Zhijie Shen commented on YARN-2583:
---

1. Be more specific: we find a more scalable method to only write a single log 
file per LRS?
{code}
  // we find a more scalable method.
{code}

2. Make 30  and 3600 the constants of AppLogAggregatorImpl?
{code}
int configuredRentionSize =
conf.getInt(NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP, 30);
{code}
{code}
if (configuredInterval  0  configuredInterval  3600) {
{code}

3. Should be ?
{code}
  if (status.size() = this.retentionSize) {
{code}
And should be ?
{code}
for (int i = 0 ; i = statusList.size() - this.retentionSize; i++) {
{code}

4. why not using yarnclient? The packaging issue?
{code}
  private ApplicationClientProtocol rmClient;
{code}

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161550#comment-14161550
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673291/YARN-2496.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5300//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5300//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161551#comment-14161551
 ] 

Hadoop QA commented on YARN-1879:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673293/YARN-1879.23.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5301//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5301//console

This message is automatically generated.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, YARN-1879.6.patch, 
 YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161547#comment-14161547
 ] 

Zhijie Shen edited comment on YARN-2583 at 10/7/14 7:26 AM:


[~xgong], thanks for the patch. I'm fine withe approach. Here're some comments:

1. Be more specific: we find a more scalable method to only write a single log 
file per LRS?
{code}
  // we find a more scalable method.
{code}

2. Make 30  and 3600 the constants of AppLogAggregatorImpl?
{code}
int configuredRentionSize =
conf.getInt(NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP, 30);
{code}
{code}
if (configuredInterval  0  configuredInterval  3600) {
{code}

3. Should be ?
{code}
  if (status.size() = this.retentionSize) {
{code}
And should be ?
{code}
for (int i = 0 ; i = statusList.size() - this.retentionSize; i++) {
{code}

4. why not using yarnclient? The packaging issue?
{code}
  private ApplicationClientProtocol rmClient;
{code}

5. The existing annotation is not correct. At least, the module has been reused 
by MR.
{code}
@Private
public class AggregatedLogDeletionService extends AbstractService {
{code}

6. Instead of make log deletion task non-static, why not passing rmClient into 
the constructor of this class

7. Shall we spin off the LogRollingInterval related change in another Jira?


was (Author: zjshen):
1. Be more specific: we find a more scalable method to only write a single log 
file per LRS?
{code}
  // we find a more scalable method.
{code}

2. Make 30  and 3600 the constants of AppLogAggregatorImpl?
{code}
int configuredRentionSize =
conf.getInt(NM_LOG_AGGREGATION_RETAIN_RETENTION_SIZE_PER_APP, 30);
{code}
{code}
if (configuredInterval  0  configuredInterval  3600) {
{code}

3. Should be ?
{code}
  if (status.size() = this.retentionSize) {
{code}
And should be ?
{code}
for (int i = 0 ; i = statusList.size() - this.retentionSize; i++) {
{code}

4. why not using yarnclient? The packaging issue?
{code}
  private ApplicationClientProtocol rmClient;
{code}

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.000.patch)

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu

 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.000.patch

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.001.patch

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.001.patch)

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.001.patch

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.001.patch)

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.001.patch

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161658#comment-14161658
 ] 

Hadoop QA commented on YARN-2641:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673309/YARN-2641.000.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5302//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5302//console

This message is automatically generated.

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161660#comment-14161660
 ] 

Junping Du commented on YARN-2641:
--

The idea here sounds interestingHowever, the decommission still happen 
after the heartbeat back to NM. AM I missing something here?

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161664#comment-14161664
 ] 

Hadoop QA commented on YARN-2641:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673311/YARN-2641.001.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5303//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5303//console

This message is automatically generated.

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161724#comment-14161724
 ] 

Tsuyoshi OZAWA commented on YARN-1458:
--

Sure, thanks for reply :-)

 FairScheduler: Zero weight can lead to livelock
 ---

 Key: YARN-1458
 URL: https://issues.apache.org/jira/browse/YARN-1458
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: Centos 2.6.18-238.19.1.el5 X86_64
 hadoop2.2.0
Reporter: qingwu.fu
Assignee: zhihai xu
  Labels: patch
 Attachments: YARN-1458.001.patch, YARN-1458.002.patch, 
 YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, 
 YARN-1458.alternative0.patch, YARN-1458.alternative1.patch, 
 YARN-1458.alternative2.patch, YARN-1458.patch, yarn-1458-5.patch, 
 yarn-1458-7.patch, yarn-1458-8.patch

   Original Estimate: 408h
  Remaining Estimate: 408h

 The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when 
 clients submit lots jobs, it is not easy to reapear. We run the test cluster 
 for days to reapear it. The output of  jstack command on resourcemanager pid:
 {code}
  ResourceManager Event Processor prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 
 waiting for monitor entry [0x43aa9000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671)
 - waiting to lock 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 ……
 FairSchedulerUpdateThread daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 
 runnable [0x433a2000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282)
 - locked 0x00070026b6e0 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255)
 at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2312:
-
Attachment: YARN-2312.6.patch

Thanks Jason and Jian for review. Updated:

* Removed unnecessary change in TestTaskAttemptListenerImpl.java - sorry for 
this, this change was included wrongly.
* Defined 0xffL as CONTAINER_ID_BITMASK and exposed it.

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
 YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
 YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.23.patch

The failure of TestClientToAMTokens looks not related - it passed on my local. 
Let me submit same patch again.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2579:
-
Attachment: YARN-2579.patch

 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161774#comment-14161774
 ] 

Rohith commented on YARN-2579:
--

Considering 1 st approach as feasible, I attached patch. Thinking of how do I 
write tests!!




 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161792#comment-14161792
 ] 

Hudson commented on YARN-2615:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/704/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java


 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
 YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161794#comment-14161794
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/704/])
Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786)
* hadoop-yarn-project/CHANGES.txt


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161793#comment-14161793
 ] 

Hudson commented on YARN-2644:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #704 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/704/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch, YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161826#comment-14161826
 ] 

Hadoop QA commented on YARN-1879:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673336/YARN-1879.23.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5305//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5305//console

This message is automatically generated.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161902#comment-14161902
 ] 

Hadoop QA commented on YARN-2312:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673331/YARN-2312.6.patch
  against trunk revision 0fb2735.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5304//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5304//console

This message is automatically generated.

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
 YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
 YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161913#comment-14161913
 ] 

Hudson commented on YARN-2644:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch, YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161914#comment-14161914
 ] 

Hudson commented on YARN-1051:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/])
Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786)
* hadoop-yarn-project/CHANGES.txt


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161912#comment-14161912
 ] 

Hudson commented on YARN-2615:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1894 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1894/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto


 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
 YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161970#comment-14161970
 ] 

Hudson commented on YARN-2644:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/])
YARN-2644. Fixed CapacityScheduler to return up-to-date headroom when AM 
allocates. Contributed by Craig Welch (jianhe: rev 
519e5a7dd2bd540105434ec3c8939b68f6c024f8)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* hadoop-yarn-project/CHANGES.txt


 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch, YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161969#comment-14161969
 ] 

Hudson commented on YARN-2615:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1919 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1919/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* hadoop-yarn-project/CHANGES.txt


 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
 YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1915:
-
Attachment: YARN-1915v3.patch

Refreshed patch to latest trunk.

[~vinodkv] could you comment?  I fully agree with Hitesh that the current patch 
is a stop-gap at best.  However there's some confusion as to how  the client 
token master key should be sent to the RM (e.g.: via container credentials, via 
the current method, etc.).  The original env variable approach apparently is 
problematic on Windows per YARN-610.

If we won't have time to develop the best fix for 2.6 then I'd like to see 
something like this patch put in to improve things in the interim.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.14.patch

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2331:
-
Attachment: YARN-2331.patch

In the interest of getting something done for this in time for 2.6, here's a 
patch that adds a conf to tell the NM whether or not it's supervised.  If 
supervised then it is expected to be quickly restarted on shutdown and will not 
kill containers.  If unsupervised then it will kill containers on shutdown 
since it does not expect to be restarted in a timely manner.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
 Attachments: YARN-2331.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162076#comment-14162076
 ] 

Hadoop QA commented on YARN-1915:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673366/YARN-1915v3.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5306//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5306//console

This message is automatically generated.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk

2014-10-07 Thread Ted Yu (JIRA)
Ted Yu created YARN-2650:


 Summary: TestRMRestart#testRMRestartGetApplicationList sometimes 
fails in trunk
 Key: YARN-2650
 URL: https://issues.apache.org/jira/browse/YARN-2650
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


I got the following failure running on Linux:
{code}
  TestRMRestart.testRMRestartGetApplicationList:952
rMAppManager.logApplicationSummary(
isA(org.apache.hadoop.yarn.api.records.ApplicationId)
);
Wanted 3 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952)
But was 2 times:
- at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time

2014-10-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162091#comment-14162091
 ] 

Jason Lowe commented on YARN-1915:
--

Test failure is unrelated, see YARN-2483.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2650) TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk

2014-10-07 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-2650:
-
Attachment: TestRMRestart.tar.gz

Test output

 TestRMRestart#testRMRestartGetApplicationList sometimes fails in trunk
 --

 Key: YARN-2650
 URL: https://issues.apache.org/jira/browse/YARN-2650
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: TestRMRestart.tar.gz


 I got the following failure running on Linux:
 {code}
   TestRMRestart.testRMRestartGetApplicationList:952
 rMAppManager.logApplicationSummary(
 isA(org.apache.hadoop.yarn.api.records.ApplicationId)
 );
 Wanted 3 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartGetApplicationList(TestRMRestart.java:952)
 But was 2 times:
 - at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.handle(RMAppManager.java:64)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162117#comment-14162117
 ] 

Hadoop QA commented on YARN-2331:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673377/YARN-2331.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery
  
org.apache.hadoop.yarn.server.nodemanager.TestContainerManagerWithLCE

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5308//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5308//console

This message is automatically generated.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162143#comment-14162143
 ] 

Vinod Kumar Vavilapalli commented on YARN-2583:
---

[~xgong], can you please post a short summary of the final proposal here?

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2651) Spin off the LogRollingInterval from LogAggregationContext

2014-10-07 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2651:
---

 Summary: Spin off the LogRollingInterval from LogAggregationContext
 Key: YARN-2651
 URL: https://issues.apache.org/jira/browse/YARN-2651
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162157#comment-14162157
 ] 

Xuan Gong commented on YARN-2583:
-

Here is the proposal:
* Add private configuration for number of logs we can save in NM side. We will 
delete old logs if the num of logs is larger than this configured value. This 
is a temporary solution. The configuration will be deleted once we find a more 
scalable method(will be tracked by YARN-2548) to only write a single log file 
per LRS. 
* jhs contacts RM to check whether app is still running or not. If this app is 
still running, we need to keep the app dir, but remove the old logs.
* Remove per-app LogRollingInterval completely and then have NM wake up every 
so often and upload log files. In this ticket, we can spin off 
LogRollingInterval from AppLogAggregatorImpl. YARN-2651 will be used to track 
the changes for other places.
* Enforce the minimal log rolling interval. (3600 seconds will be used as 
minimal value)

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162170#comment-14162170
 ] 

Craig Welch commented on YARN-1857:
---

This is an interesting question - that logic predates this change, and I 
wondered if there were cases when userLimit could somehow be  queueMaxCap, and 
as I look at the code, surprisingly, I believe so.  Userlimit is calculated 
based on absolute queue values whereas, at least since [YARN-2008], queueMaxCap 
takes into account actual useage in other queues.  So, it is entirely possible 
for userLimit to be  queueMaxCap due to how they are calculated, at least post 
[YARN-2008].  I'm not sure if pre-2008 that was possible as well, it may have 
been, there is a bit to how that was calculated even before that change - in 
any event, it is the case now.  So, as it happens, I don't believe we can do 
the simplification.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-07 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2633:

Attachment: YARN-2633.patch

 TestContainerLauncherImpl sometimes fails
 -

 Key: YARN-2633
 URL: https://issues.apache.org/jira/browse/YARN-2633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2633.patch, YARN-2633.patch


 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
   at java.lang.Class.getMethod(Class.java:1665)
   at 
 org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
   at 
 org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-07 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2566:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-91

 IOException happen in startLocalizer of DefaultContainerExecutor due to not 
 enough disk space for the first localDir.
 -

 Key: YARN-2566
 URL: https://issues.apache.org/jira/browse/YARN-2566
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
 YARN-2566.002.patch, YARN-2566.003.patch


 startLocalizer in DefaultContainerExecutor will only use the first localDir 
 to copy the token file, if the copy is failed for first localDir due to not 
 enough disk space in the first localDir, the localization will be failed even 
 there are plenty of disk space in other localDirs. We see the following error 
 for this case:
 {code}
 2014-09-13 23:33:25,171 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
 create app directory 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
 java.io.IOException: mkdir of 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,185 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.io.FileNotFoundException: File 
 file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
 does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
   at 
 org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344)
   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,186 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1410663092546_0004_01_01 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 2014-09-13 23:33:25,187 WARN 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
 OPERATION=Container Finished - Failed   TARGET=ContainerImpl
 RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
   APPID=application_1410663092546_0004
 CONTAINERID=container_1410663092546_0004_01_01

[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-07 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162216#comment-14162216
 ] 

Vinod Kumar Vavilapalli commented on YARN-2566:
---

I am surprised YARN-1781 didn't take care of this /cc [~vvasudev].

IAC, instead of creating yet another algorithm, this should plug into the 
good-local-dirs added via YARN-1781. It can simply take in the one (first or 
randomized) of the good-local-dirs instead of the first in the overall list. 
That should address this issue.

 IOException happen in startLocalizer of DefaultContainerExecutor due to not 
 enough disk space for the first localDir.
 -

 Key: YARN-2566
 URL: https://issues.apache.org/jira/browse/YARN-2566
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
 YARN-2566.002.patch, YARN-2566.003.patch


 startLocalizer in DefaultContainerExecutor will only use the first localDir 
 to copy the token file, if the copy is failed for first localDir due to not 
 enough disk space in the first localDir, the localization will be failed even 
 there are plenty of disk space in other localDirs. We see the following error 
 for this case:
 {code}
 2014-09-13 23:33:25,171 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
 create app directory 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
 java.io.IOException: mkdir of 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,185 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.io.FileNotFoundException: File 
 file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
 does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
   at 
 org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344)
   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,186 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1410663092546_0004_01_01 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 2014-09-13 

[jira] [Updated] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2014-10-07 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-314:
--
Attachment: yarn-314-prelim.patch

Attaching a preliminary patch for any early feedback. 

 Schedulers should allow resource requests of different sizes at the same 
 priority and location
 --

 Key: YARN-314
 URL: https://issues.apache.org/jira/browse/YARN-314
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.2-alpha
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Fix For: 2.6.0

 Attachments: yarn-314-prelim.patch


 Currently, resource requests for the same container and locality are expected 
 to all be the same size.
 While it it doesn't look like it's needed for apps currently, and can be 
 circumvented by specifying different priorities if absolutely necessary, it 
 seems to me that the ability to request containers with different resource 
 requirements at the same priority level should be there for the future and 
 for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162266#comment-14162266
 ] 

Hadoop QA commented on YARN-2633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673394/YARN-2633.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5309//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5309//console

This message is automatically generated.

 TestContainerLauncherImpl sometimes fails
 -

 Key: YARN-2633
 URL: https://issues.apache.org/jira/browse/YARN-2633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2633.patch, YARN-2633.patch


 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
   at java.lang.Class.getMethod(Class.java:1665)
   at 
 org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
   at 
 org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-07 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2414:
-
Attachment: YARN-2414.patch

Attached a simple fix for this issue.
Please kindly view. 

Thanks,

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan
 Attachments: YARN-2414.patch


 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162316#comment-14162316
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673374/YARN-796.node-label.consolidate.14.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 3 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue
  
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5307//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5307//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5307//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5307//console

This message is automatically generated.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.14.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2331:
-
Attachment: YARN-2331v2.patch

Updated patch to fix the unit tests.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162324#comment-14162324
 ] 

Xuan Gong commented on YARN-2583:
-

Upload a new patch to address all the comments

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.4.patch

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1857:
--
Attachment: YARN-1857.7.patch


I see it now - uploading newer patch with simplified final headroom calculation

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162357#comment-14162357
 ] 

Chen He commented on YARN-1857:
---

Hi [~cwelch], 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
 has the detail information about these parameters. 

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-07 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162366#comment-14162366
 ] 

Jason Lowe commented on YARN-2414:
--

Thanks, Wangda!  Looks good overall.  Nit: appMerics should be appMetrics.

Have you done any manual testing with this patch?  Seems like it would be 
straightforward to mock up an injector with a mock context containing an app 
with no attempts.  With that the unit test can verify render doesn't throw.   
Thinking something along the lines of WebAppTests.testPage/testBlock.

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan
 Attachments: YARN-2414.patch


 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at 

[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162416#comment-14162416
 ] 

Hadoop QA commented on YARN-2583:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673409/YARN-2583.4.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5313//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5313//console

This message is automatically generated.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch, YARN-2583.4.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162414#comment-14162414
 ] 

Hadoop QA commented on YARN-2331:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673407/YARN-2331v2.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5312//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5312//console

This message is automatically generated.

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162419#comment-14162419
 ] 

Hadoop QA commented on YARN-2414:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673402/YARN-2414.patch
  against trunk revision 2e789eb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5310//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5310//console

This message is automatically generated.

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan
 Attachments: YARN-2414.patch


 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 

[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162461#comment-14162461
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673410/YARN-1857.7.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5311//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5311//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2652) add hadoop-yarn-registry package under hadoop-yarn

2014-10-07 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-2652:


 Summary: add hadoop-yarn-registry package under hadoop-yarn
 Key: YARN-2652
 URL: https://issues.apache.org/jira/browse/YARN-2652
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Steve Loughran
Assignee: Steve Loughran


Commit the core {{hadoop-yarn-registry}} module, including
* changes to POMs to include in the build
* additions to the yarn site

This patch excludes
* RM integration (YARN-2571)
* Distributed Shell integration and tests (YARN-2646)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162487#comment-14162487
 ] 

Jian He commented on YARN-1857:
---

Craig, thanks for updating. 
looks good, +1

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: yarnregistry.pdf

TLA+ specification in sync with (forthcoming -019 patch)

# uses consistent naming
# service record specified as allowing arbitrary {{String |- String}} 
attributes; {{yarn:id}} and {{yarn:persistence}} are merely two of these

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162496#comment-14162496
 ] 

Chen He commented on YARN-1857:
---

Unit test failure is because of YARN-2400

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162497#comment-14162497
 ] 

Chen He commented on YARN-1857:
---

Sorry, my bad, looks like YARN-2400 is checked in. Anyway, it is not related to 
this patch.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-10-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2320:
--
Attachment: YARN-2320.3.patch

Update against the latest trunk

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162510#comment-14162510
 ] 

Hudson commented on YARN-1857:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6206 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6206/])
YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. 
Contributed by Chen He and Craig Welch (jianhe: rev 
30d56fdbb40d06c4e267d6c314c8c767a7adc6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-019.patch

This a patch which addresses sanjay's review comments, with the key changes 
being

* {{ServiceRecord}} supports arbitrary string attributes; the yarn ID and 
persistence are simply two of these.
* moved zk-related classes under {{/impl/zk}} package
* TLA+ spec in sync with new design

I also 
* moved the packaging from {{org.apache.hadoop.yarn.registry}} to  
{{org.apache.hadoop.registry}}. That would reduce the impact of any promotion 
of the registry into hadoop-common if ever desired.
* moved the registry/site documentation under yarn-site and hooked up with 
existing site docs.
* added the TLA toolbox generated artifacts (pdf and a toolbox dir) to 
{{/.gitignore}}


 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, 
 yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2483) TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to incorrect AppAttempt state

2014-10-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162530#comment-14162530
 ] 

Jian He commented on YARN-2483:
---

YARN-2649 may solve this

 TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails due to 
 incorrect AppAttempt state
 

 Key: YARN-2483
 URL: https://issues.apache.org/jira/browse/YARN-2483
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/665/console :
 {code}
 testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
   Time elapsed: 49.686 sec   FAILURE!
 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:84)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:582)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:589)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForNewAMToLaunchAndRegister(MockRM.java:182)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:402)
 {code}
 TestApplicationMasterLauncher#testallocateBeforeAMRegistration fails with 
 similar cause.
 These tests failed in build #664 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2320) Removing old application history store after we store the history data to timeline store

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162640#comment-14162640
 ] 

Hadoop QA commented on YARN-2320:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673422/YARN-2320.3.patch
  against trunk revision 30d56fd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 24 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5314//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5314//console

This message is automatically generated.

 Removing old application history store after we store the history data to 
 timeline store
 

 Key: YARN-2320
 URL: https://issues.apache.org/jira/browse/YARN-2320
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2320.1.patch, YARN-2320.2.patch, YARN-2320.3.patch


 After YARN-2033, we should deprecate application history store set. There's 
 no need to maintain two sets of store interfaces. In addition, we should 
 conclude the outstanding jira's under YARN-321 about the application history 
 store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.002.patch

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch, 
 YARN-2641.002.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162665#comment-14162665
 ] 

Li Lu commented on YARN-2629:
-

Hi [~zjshen], I reviewed this patch (YARN-2629.3.patch), and it generally LGTM. 
However, I do have a few quick comments:

1. I noticed that you've changed a few method signatures, and removed the 
throws IOException, YarnException clause from the signatures. Meanwhile, in 
the method I noticed that you're catching all types of exceptions in one catch 
statement. I agree that this change saves some repeated lines of try-catch 
statements in various call sites, but is it OK catch all exceptions internally? 
 (I think so, but would like to verify that. )

2. Maybe we want to add one more test with no arguments on 
domain/view_acl/modify_acl? Timeline domain is a newly added feature, and maybe 
we'd like to make sure it would not break any existing distributed shell usages 
(esp. in some scripts)? 

3. In Client.java:
{code}
private boolean toCreate = false;
{code}
This variable name looks unclear. Maybe we want to rename it so that people can 
easily connect it with timeline domains?

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162683#comment-14162683
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673424/YARN-913-019.patch
  against trunk revision 30d56fd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5315//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5315//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-registry.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5315//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5315//console

This message is automatically generated.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, yarnregistry.pdf, yarnregistry.pdf, yarnregistry.pdf, 
 yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: (was: YARN-2641.002.patch)

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-913:

Attachment: YARN-913-020.patch

patch -020; deleted unused operation/assignment in 
{{RegistryUtils.statChildren()}}

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, YARN-913-020.patch, yarnregistry.pdf, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162740#comment-14162740
 ] 

zhihai xu commented on YARN-2641:
-

Hi [~djp], thanks to review the patch. I removed the following RMNode 
decommission in nodeHeartbeat(ResourceTrackerService.java).

{code}
   this.rmContext.getDispatcher().getEventHandler().handle(
  new RMNodeEvent(nodeId, RMNodeEventType.DECOMMISSION));
{code}

I added RMNode decommission in refreshNodes(NodesListManager.java).

Did you still see  the decommission happen after the heartbeat back to NM in 
the patch?

I didn't have unit test in my first patch(YARN-2641.000.patch).

In my second patch(YARN-2641.001.patch), I change the unit test in 
TestResourceTrackerService to verify the RMNodeEventType.DECOMMISSION is sent 
in {code}rm.getNodesListManager().refreshNodes(conf);{code} instead of
{code}nodeHeartbeat = nm1.nodeHeartbeat(true); {code}


 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162764#comment-14162764
 ] 

Hadoop QA commented on YARN-2641:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673446/YARN-2641.002.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5316//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5316//console

This message is automatically generated.

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-07 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Description: 
improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).
This will be a much more serious issue:
After RM is refreshed (refreshNodes), If the NM to be decommissioned is killed 
before NM sent heartbeat to RM. The RMNode will never be decommissioned in RM. 
The RMNode will only expire in RM after  
yarn.nm.liveness-monitor.expiry-interval-ms(default value 10 minutes) time.

  was:
improve node decommission latency in RM. 
Currently the node decommission only happened after RM received nodeHeartbeat 
from the Node Manager. The node heartbeat interval is configurable. The default 
value is 1 second.
It will be better to do the decommission during RM Refresh(NodesListManager) 
instead of nodeHeartbeat(ResourceTrackerService).


 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch, YARN-2641.001.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).
 This will be a much more serious issue:
 After RM is refreshed (refreshNodes), If the NM to be decommissioned is 
 killed before NM sent heartbeat to RM. The RMNode will never be 
 decommissioned in RM. The RMNode will only expire in RM after  
 yarn.nm.liveness-monitor.expiry-interval-ms(default value 10 minutes) time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162804#comment-14162804
 ] 

Zhijie Shen commented on YARN-2629:
---

bq. 1. I noticed that you've changed a few method signatures, and removed the 
throws IOException, YarnException clause from the signatures. Meanwhile, in 
the method I noticed that you're catching all types of exceptions in one catch 
statement. I agree that this change saves some repeated lines of try-catch 
statements in various call sites, but is it OK catch all exceptions internally?

Previously, the exception was handled at the caller of publish(), but I 
moved into publish() to simplify the change, such that there's no need to 
throw the exception any more. Usually, it's a good behavior to handle each 
potential exception explicitly. Here since I intentionally don't want to do 
such fine-grained handling, I choose to log whatever the exception is.

bq. 2. Maybe we want to add one more test with no arguments on 
domain/view_acl/modify_acl? Timeline domain is a newly added feature, and maybe 
we'd like to make sure it would not break any existing distributed shell usages 
(esp. in some scripts)?

Added one

bq. This variable name looks unclear. Maybe we want to rename it so that people 
can easily connect it with timeline domains?

It's explained in the usage of the option and have be commented:
{code}
  // Flag to indicate whether to create the domain of the given ID
  private boolean toCreate = false;
{code}

but let me elaborate this var in code.

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2629:
--
Attachment: YARN-2629.4.patch

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, 
 YARN-2629.4.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162812#comment-14162812
 ] 

Li Lu commented on YARN-2629:
-

OK, new patch LGTM. Maybe some committers would like to check it again and 
commit it? 

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, 
 YARN-2629.4.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162816#comment-14162816
 ] 

Zhijie Shen commented on YARN-2423:
---

Update the title and description, as it's good to do the work for domain APIs 
too.

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2423) TimelineClient should wrap all GET APIs to facilitate Java users

2014-10-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2423:
--
Description: TimelineClient provides the Java method to put timeline 
entities. It's also good to wrap over all GET APIs (both entity and domain), 
and deserialize the json response into Java POJO objects.  (was: TimelineClient 
provides the Java method to put timeline entities. It's also good to wrap over 
three GET APIs, and deserialize the json response into Java POJO objects.)
Summary: TimelineClient should wrap all GET APIs to facilitate Java 
users  (was: TimelineClient should wrap the three GET APIs to facilitate Java 
users)

 TimelineClient should wrap all GET APIs to facilitate Java users
 

 Key: YARN-2423
 URL: https://issues.apache.org/jira/browse/YARN-2423
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 TimelineClient provides the Java method to put timeline entities. It's also 
 good to wrap over all GET APIs (both entity and domain), and deserialize the 
 json response into Java POJO objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2517) Implement TimelineClientAsync

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162820#comment-14162820
 ] 

Zhijie Shen commented on YARN-2517:
---

bq. Clearly, async calls need call-back handlers just for errors. As of today, 
there are no APIs that really need to send back results (not error) 
asynchronously. 

I realize that another requirement of TimelineClient is to wrap the read APIs 
to facilitate the java app developers. These APIs are going to return results.

 Implement TimelineClientAsync
 -

 Key: YARN-2517
 URL: https://issues.apache.org/jira/browse/YARN-2517
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2517.1.patch


 In some scenarios, we'd like to put timeline entities in another thread no to 
 block the current one.
 It's good to have a TimelineClientAsync like AMRMClientAsync and 
 NMClientAsync. It can buffer entities, put them in a separate thread, and 
 have callback to handle the responses.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162862#comment-14162862
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673460/YARN-913-020.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5317//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5317//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5317//console

This message is automatically generated.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162865#comment-14162865
 ] 

Hadoop QA commented on YARN-2629:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673479/YARN-2629.4.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5318//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5318//console

This message is automatically generated.

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch, 
 YARN-2629.4.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-07 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162875#comment-14162875
 ] 

Karthik Kambatla commented on YARN-1879:


Thanks for the updates, Tsuyoshi. Sorry to vacillate on this JIRA. 

Chatted with Jian offline about marking the methods in question Idempotent vs 
AtMostOnce. I believe we agreed on Methods that identify and ignore duplicate 
requests as duplicate should be AtMostOnce, and those that repeat the method 
without any adverse side-effects should be Idempotent. Retries on RM 
restart/failover are the same for Idempotent and AtMostOnce methods.

Following that, registerApplicationMaster should be Idempotent, allocate and 
finishApplicationMaster should be AtMostOnce. Given all the methods handle 
duplicate requests, the retry-cache is not necessary but could be an 
optimization we can pursue/investigate on another JIRA.

Review comments on the patch:
# Nit: Not added in this patch, can we rename 
TestApplicationMasterServiceProtocolOnHA#initiate to initialize()? 
# IIUC, the tests (ProtocolHATestBase) induce a failover while processing one 
of these requests. We should also probably add a test that makes duplicate 
requests to the same/different RM and verify the behavior is as expected. 
Correct me if existing tests already do this.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.23.patch, 
 YARN-1879.23.patch, YARN-1879.3.patch, YARN-1879.4.patch, YARN-1879.5.patch, 
 YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162908#comment-14162908
 ] 

Hadoop QA commented on YARN-913:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673478/YARN-913-021.patch
  against trunk revision 9b8a35a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 37 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5319//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5319//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5319//console

This message is automatically generated.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, 
 YARN-913-019.patch, YARN-913-020.patch, YARN-913-021.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

2014-10-07 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162917#comment-14162917
 ] 

Junping Du commented on YARN-2331:
--

Thanks [~jlowe] for the patch. One thing I want to confirm here is: after this 
patch, if we setting yarn.nodemanager.recovery.enabled to true but setting 
yarn.nodemanager.recovery.supervised to false, we can still keep container 
running if we kill NM daemon by kill -9 but go through yarn-daemon.sh stop 
nodemanager will kill running containers. Isn't it?

 Distinguish shutdown during supervision vs. shutdown for rolling upgrade
 

 Key: YARN-2331
 URL: https://issues.apache.org/jira/browse/YARN-2331
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2331.patch, YARN-2331v2.patch


 When the NM is shutting down with restart support enabled there are scenarios 
 we'd like to distinguish and behave accordingly:
 # The NM is running under supervision.  In that case containers should be 
 preserved so the automatic restart can recover them.
 # The NM is not running under supervision and a rolling upgrade is not being 
 performed.  In that case the shutdown should kill all containers since it is 
 unlikely the NM will be restarted in a timely manner to recover them.
 # The NM is not running under supervision and a rolling upgrade is being 
 performed.  In that case the shutdown should not kill all containers since a 
 restart is imminent due to the rolling upgrade and the containers will be 
 recovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2654) Revisit all shared cache config parameters to ensure quality names

2014-10-07 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-2654:
--

 Summary: Revisit all shared cache config parameters to ensure 
quality names
 Key: YARN-2654
 URL: https://issues.apache.org/jira/browse/YARN-2654
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Priority: Blocker


Revisit all the shared cache config parameters in YarnConfiguration and 
yarn-default.xml to ensure quality names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2180) In-memory backing store for cache manager

2014-10-07 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2180:
---
Attachment: YARN-2180-trunk-v7.patch

[~kasha]

Attached v7. Changes:
1. Moved AppChecker parameter to SCM instead of store and changed variable 
names.
2. Moved createAppCheckerService method to a static method in 
SharedCacheManager. Originally I moved it to the shared cache util class, but 
that is in yarn-server-common which required moving AppChecker as well. I only 
see AppChecker being used in the context of the SCM so I kept it where it was. 
Let me know if you would still like me to move it to common and make the create 
method part of the util class.
3. Renamed SharedCacheStructureUtil to SharedCacheUtil.
4. Filed YARN-2654 subtask to revisit config names.

 In-memory backing store for cache manager
 -

 Key: YARN-2180
 URL: https://issues.apache.org/jira/browse/YARN-2180
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
 YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
 YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch


 Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163057#comment-14163057
 ] 

Hadoop QA commented on YARN-2579:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673338/YARN-2579.patch
  against trunk revision 1efd9c9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5320//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5320//console

This message is automatically generated.

 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2655:
---

 Summary: AllocatedGB/AvailableGB in nodemanager JMX showing only 
integer values
 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor


AllocatedGB/AvailableGB in nodemanager JMX showing only integer values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2655:

Attachment: screenshot-1.png

 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 --

 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor
 Attachments: screenshot-1.png


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2655:

Attachment: screenshot-2.png

 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 --

 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor
 Attachments: screenshot-1.png, screenshot-2.png


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2655) AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

2014-10-07 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2655:

Description: 
AllocatedGB/AvailableGB in nodemanager JMX showing only integer values

Screenshot attached

  was:AllocatedGB/AvailableGB in nodemanager JMX showing only integer values


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 --

 Key: YARN-2655
 URL: https://issues.apache.org/jira/browse/YARN-2655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Nishan Shetty
Priority: Minor
 Attachments: screenshot-1.png, screenshot-2.png


 AllocatedGB/AvailableGB in nodemanager JMX showing only integer values
 Screenshot attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2180) In-memory backing store for cache manager

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163088#comment-14163088
 ] 

Hadoop QA commented on YARN-2180:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673521/YARN-2180-trunk-v7.patch
  against trunk revision 1efd9c9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5321//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5321//console

This message is automatically generated.

 In-memory backing store for cache manager
 -

 Key: YARN-2180
 URL: https://issues.apache.org/jira/browse/YARN-2180
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2180-trunk-v1.patch, YARN-2180-trunk-v2.patch, 
 YARN-2180-trunk-v3.patch, YARN-2180-trunk-v4.patch, YARN-2180-trunk-v5.patch, 
 YARN-2180-trunk-v6.patch, YARN-2180-trunk-v7.patch


 Implement an in-memory backing store for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2582) Log related CLI and Web UI changes for LRS

2014-10-07 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163116#comment-14163116
 ] 

Zhijie Shen commented on YARN-2582:
---

Thanks for the patch, Xuan! Here're some comments and thoughts.

1. This may not be sufficient. For example, tmp_\[timestamp\].log will become 
a false positive case here. We need to check whether the file name ends with 
TMP_FILE_SUFFIX or not.
{code}
   !fileName.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}
{code}
  if (!thisNodeFile.getPath().getName()
.contains(LogAggregationUtils.TMP_FILE_SUFFIX)) {
{code}

2. I'm thinking what the better way should be to handle the case that it fails 
at fetching one log file, but not the others. Thoughts?
{code}
AggregatedLogFormat.LogReader reader = null;
try {
  reader =
  new AggregatedLogFormat.LogReader(getConf(),
thisNodeFile.getPath());
  if (dumpAContainerLogs(containerId, reader, System.out)  -1) {
foundContainerLogs = true;
  }
} finally {
  if (reader != null) {
reader.close();
  }
}
{code}

3. In fact, in the case of either single container or all containers, when the 
container log is not found, it is possible that 1) the container log hasn't 
been aggregated yet or 2) the log aggregation is not enabled. The the former 
case it is also possible to have the third situation: the user are providing an 
invalid nodeId. It's good to uniform the warning message.
{code}
if (!foundContainerLogs) {
  System.out.println(Logs for container  + containerId
  +  are not present in this log-file.);
  return -1;
}
{code}
{code}
if (! foundAnyLogs) {
  System.out.println(Logs not available at  + remoteAppLogDir.toString());
  System.out
  .println(Log aggregation has not completed or is not enabled.);
  return -1;
}
{code}

4. It seems that we don't have unit tests to cover the successful path of 
dumping the container log(s) currently. I'm not sure if it is going to be a big 
addition. If it is, let's handle it separately.

There're two general issues:

1. Currently we don't have the web page on RM web app to show the aggregated 
logs. One the other hand, LRS is supposed to be always in the running state, 
such that the timeline server will not present it. Therefore, we still haven't 
a web entrance to view the aggregated logs.

2. For LRS, the latest logs are not aggregated in the HDFS, but in the local 
dir of NM. When a user says he wants to check the logs of this LRS, it doesn't 
make sense that he can just view yesterday's log, because today's is still not 
uploaded to HDFS. It's good to have a combined view of both latest log on local 
dir of NM and aggregated logs on HDFS. Thoughts?

 Log related CLI and Web UI changes for LRS
 --

 Key: YARN-2582
 URL: https://issues.apache.org/jira/browse/YARN-2582
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2582.1.patch


 After YARN-2468, we have change the log layout to support log aggregation for 
 Long Running Service. Log CLI and related Web UI should be modified 
 accordingly.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2252) Intermittent failure of TestFairScheduler.testContinuousScheduling

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163118#comment-14163118
 ] 

Tsuyoshi OZAWA commented on YARN-2252:
--

Hi, this test failure is found on trunk - you can find it 
[here|https://issues.apache.org/jira/browse/YARN-2312?focusedCommentId=14161902page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14161902].
 Log is as follows:

{code}
Tests run: 79, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.698 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
testContinuousScheduling(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
  Time elapsed: 0.582 sec   FAILURE!
java.lang.AssertionError: expected:2 but was:1
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3372)
{code}

Can I reopen this problem?

 Intermittent failure of TestFairScheduler.testContinuousScheduling
 --

 Key: YARN-2252
 URL: https://issues.apache.org/jira/browse/YARN-2252
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: trunk-win
Reporter: Ratandeep Ratti
  Labels: hadoop2, scheduler, yarn
 Fix For: 2.6.0

 Attachments: YARN-2252-1.patch, yarn-2252-2.patch


 This test-case is failing sporadically on my machine. I think I have a 
 plausible explanation  for this.
 It seems that when the Scheduler is being asked for resources, the resource 
 requests that are being constructed have no preference for the hosts (nodes).
 The two mock hosts constructed, both have a memory of 8192 mb.
 The containers(resources) being requested each require a memory of 1024mb, 
 hence a single node can execute both the resource requests for the 
 application.
 In the end of the test-case it is being asserted that the containers 
 (resource requests) be executed on different nodes, but since we haven't 
 specified any preferences for nodes when requesting the resources, the 
 scheduler (at times) executes both the containers (requests) on the same node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-07 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163120#comment-14163120
 ] 

Tsuyoshi OZAWA commented on YARN-2312:
--

The test failure is not related - the failure of TestFairScheduler is reported 
on YARN-2252 and the failure of TestPipeApplication is reported on YARN-6115. 

[~jianhe], [~jlowe], could you review latest patch?

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
 YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
 YARN-2312.4.patch, YARN-2312.5.patch, YARN-2312.6.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)