[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195847#comment-14195847
 ] 

Karthik Kambatla commented on YARN-2010:


Jian - thanks for taking the time to look at this closely. The patch looks 
mostly good. However, this patch only catches DT renewal issues during app 
recovery. Any other errors that could be encountered in the remaining code 
paths are not handled gracefully. For instance, any errors during 
app.recoverAppAttempts can affect the health of the RM.



> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, 
> yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, 
> yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, 
> yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195819#comment-14195819
 ] 

Hadoop QA commented on YARN-2802:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679156/YARN-2802.001.patch
  against trunk revision 2bb327e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5719//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5719//console

This message is automatically generated.

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Issue Type: Improvement  (was: Bug)

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195746#comment-14195746
 ] 

zhihai xu commented on YARN-2802:
-

attached the patch YARN-2802.001.patch to fix the test error.

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Attachment: YARN-2802.001.patch

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch, YARN-2802.001.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195658#comment-14195658
 ] 

Hadoop QA commented on YARN-2802:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679114/YARN-2802.000.patch
  against trunk revision c5a46d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5718//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5718//console

This message is automatically generated.

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195647#comment-14195647
 ] 

Hadoop QA commented on YARN-2010:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679113/yarn-2010-11.patch
  against trunk revision c5a46d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5717//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5717//console

This message is automatically generated.

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, 
> yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, 
> yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, 
> yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195634#comment-14195634
 ] 

Xuan Gong commented on YARN-2505:
-

+1 for the latest patch.  Leave it tomorrow in case vinod has further comments 
about it.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195630#comment-14195630
 ] 

Hadoop QA commented on YARN-2604:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679108/YARN-2604.patch
  against trunk revision c5a46d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5716//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5716//console

This message is automatically generated.

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195587#comment-14195587
 ] 

Hadoop QA commented on YARN-2505:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679097/YARN-2505.15.patch
  against trunk revision c5a46d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5715//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5715//console

This message is automatically generated.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195581#comment-14195581
 ] 

Jian He commented on YARN-2010:
---

[~kasha], I reviewed your patch.  and just made some edits on top of your 
patch, could you please take a look ? thx 

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, 
> yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, 
> yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, 
> yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Attachment: YARN-2802.000.patch

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Attachment: (was: YARN-2802.000.patch)

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2010:
--
Attachment: yarn-2010-11.patch

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-11.patch, 
> yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, 
> yarn-2010-5.patch, yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, 
> yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-03 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2604:

Attachment: YARN-2604.patch

The findbugs warning had to do with the lock I added.  During initialization, 
it doesn't use the lock, which should be fine.  I've added a findbugs warning.  
(Also, the findbugs warning seemed backwards on which places it labeled 
synchronized and unsynchronized)

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195536#comment-14195536
 ] 

Hadoop QA commented on YARN-2604:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679085/YARN-2604.patch
  against trunk revision 35d353e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5713//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5713//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5713//console

This message is automatically generated.

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195535#comment-14195535
 ] 

Hadoop QA commented on YARN-2804:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679093/YARN-2804.1.patch
  against trunk revision c5a46d4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5714//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5714//console

This message is automatically generated.

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2804.1.patch
>
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
> java.util.Map does not have a no-arg default constructor.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
>   at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
>

[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-03 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195532#comment-14195532
 ] 

Craig Welch commented on YARN-2803:
---

I'll have a look, [~rusanu], did this happen when you ran the unit tests?  Can 
you have a look also?

> MR distributed cache not working correctly on Windows after NodeManager 
> privileged account changes.
> ---
>
> Key: YARN-2803
> URL: https://issues.apache.org/jira/browse/YARN-2803
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Nauroth
>Priority: Critical
>
> This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
> {{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running 
> git bisect, I traced it to the YARN-2198 patch to remove the need to run 
> NodeManager as a privileged account.  The tests started failing when that 
> patch was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-11-03 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195526#comment-14195526
 ] 

Subru Krishnan commented on YARN-2738:
--

[~adhoot], I personally prefer per queue configuration and not just for 
enabling configuration of org specific agents/policies. I do not believe it 
adds significant overhead while at the same time enabling reservation system to 
run side-by-side existing queue mechanism. It provides greater flexibility in 
trying out reservations for only part of the cluster as partitioned by a leaf 
queue and in phased migration if required. 

What is the additional complexity between per-queue and system wide settings as 
we do have global defaults which should work for majority of the scenarios?

> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch, YARN-2738.002.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195524#comment-14195524
 ] 

Hadoop QA commented on YARN-2802:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679084/YARN-2802.000.patch
  against trunk revision 35d353e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup
  
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.TestRMAppAttemptTransitions

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5712//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5712//console

This message is automatically generated.

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195504#comment-14195504
 ] 

Wangda Tan commented on YARN-2505:
--

host or host:0 should be host *and* host:0

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195499#comment-14195499
 ] 

Wangda Tan commented on YARN-2505:
--

bq. As I understand it it is possible to run more than one nodemanager on a 
host and in that case they are distinguished by the port they listen on, so 
there is also a practical/functional reason why the port needs to be retained 
in the id. Additionally, the node id is well established as the host:port combo 
throughout, it's good to keep that consistent.
I see, I don't have strong opinion about if we should return host or host:0 
when port=0 to user. But I still prefer to support host or host:0 when user 
input hosts. :)

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.15.patch

This patch drops the single label at a time operations for both cluster and 
node level labels to avoid duplication with the group operations.  In addition, 
the cluster node label and individual node label aggregate operations have been 
harmonized to use the suffix to the post url  (.../add .../remove .../replace) 
operations to keep them consistent with one another (and enable group changes 
everywhere).  The only overlap now is with node-to-labels and node level node 
label operations, but these are both likely to be useful in different scenarios 
so it makes sense to have them both.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.15.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195483#comment-14195483
 ] 

Craig Welch commented on YARN-2505:
---

As I understand it it is possible to run more than one nodemanager on a host 
and in that case they are distinguished by the port they listen on, so there is 
also a practical/functional reason why the port needs to be retained in the id. 
 Additionally, the node id is well established as the host:port combo 
throughout, it's good to keep that consistent.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2804:
--
Attachment: YARN-2804.1.patch

In the patch, I made a compromise when changing TimelineEntity and 
TimelineEvent, to ensure java API compatible as well as satisfy JAXB. For put 
domain response, I change to return an empty TimelinePutResponse instead of 
using Jersey Response.

After these changes, the exceptions and the warnings are gone from .out.

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
> Attachments: YARN-2804.1.patch
>
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
> java.util.Map does not have a no-arg default constructor.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
>   at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
>   at javax.xml.bind.ContextFinder.find(ContextFinder.java:432)
>   at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637)
>   at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412)
>   at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352)
>   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
>   at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
>   at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
>   at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
>   at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>   at 
> com.sun.jersey.server.impl.uri.rules

[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally

2014-11-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195478#comment-14195478
 ] 

Vinod Kumar Vavilapalli commented on YARN-1922:
---

Thanks for the reviews, [~vvasudev]!

> Process group remains alive after container process is killed externally
> 
>
> Key: YARN-1922
> URL: https://issues.apache.org/jira/browse/YARN-1922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
> Environment: CentOS 6.4
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: 2.6.0
>
> Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, 
> YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch
>
>
> If the main container process is killed externally, ContainerLaunch does not 
> kill the rest of the process group.  Before sending the event that results in 
> the ContainerLaunch.containerCleanup method being called, ContainerLaunch 
> sets the "completed" flag to true.  Then when cleaning up, it doesn't try to 
> read the pid file if the completed flag is true.  If it read the pid file, it 
> would proceed to send the container a kill signal.  In the case of the 
> DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195475#comment-14195475
 ] 

Zhijie Shen commented on YARN-2804:
---

If the map interface issue is resolved, another issue which didn't occur before 
will show up too:
{code}
java.lang.IllegalAccessException: Class
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8
can not access a member of class javax.ws.rs.core.Response with
modifiers "protected"
 at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:65)
 at java.lang.Class.newInstance0(Class.java:349)
 at java.lang.Class.newInstance(Class.java:308)
 at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
 at 
com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
 at 
com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
 at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
 at 
com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
 at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
 at 
com.sun.jersey.server.impl.wadl.WadlResource.getWadl(WadlResource.java:89)
{code}

This needs to be fixed together to completely avoid the excessive log though it 
seems not to be necessary if we upgrade jersey (See 
[here|https://java.net/projects/jersey/lists/users/archive/2011-10/message/117])

> Timeline server .out log have JAXB binding exceptions and warnings.
> ---
>
> Key: YARN-2804
> URL: https://issues.apache.org/jira/browse/YARN-2804
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>Priority: Critical
>
> Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
> the resources. However, there are noises in .out log:
> {code}
> SEVERE: Failed to generate the schema for the JAX-B elements
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
> IllegalAnnotationExceptions
> java.util.Map is an interface, and JAXB can't handle interfaces.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
> java.util.Map does not have a no-arg default constructor.
>   this problem is related to the following location:
>   at java.util.Map
>   at public java.util.Map 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
>   at public java.util.List 
> org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
>   at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
>   at 
> com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
>   at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
>   at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
>   at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
>   at javax.xml.bind.ContextFinder.find(ContextFinder.jav

[jira] [Created] (YARN-2804) Timeline server .out log have JAXB binding exceptions and warnings.

2014-11-03 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2804:
-

 Summary: Timeline server .out log have JAXB binding exceptions and 
warnings.
 Key: YARN-2804
 URL: https://issues.apache.org/jira/browse/YARN-2804
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


Unlike other daemon, timeline server binds JacksonJaxbJsonProvider to resolve 
the resources. However, there are noises in .out log:

{code}
SEVERE: Failed to generate the schema for the JAX-B elements
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException: 2 counts of 
IllegalAnnotationExceptions
java.util.Map is an interface, and JAXB can't handle interfaces.
this problem is related to the following location:
at java.util.Map
at public java.util.Map 
org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities
java.util.Map does not have a no-arg default constructor.
this problem is related to the following location:
at java.util.Map
at public java.util.Map 
org.apache.hadoop.yarn.api.records.timeline.TimelineEvent.getEventInfo()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEvent
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntity.getEvents()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntity
at public java.util.List 
org.apache.hadoop.yarn.api.records.timeline.TimelineEntities.getEntities()
at org.apache.hadoop.yarn.api.records.timeline.TimelineEntities

at 
com.sun.xml.bind.v2.runtime.IllegalAnnotationsException$Builder.check(IllegalAnnotationsException.java:106)
at 
com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:489)
at 
com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:319)
at 
com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
at 
com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:248)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:235)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:432)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412)
at 
com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352)
at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115)
at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
at 
com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
at 
com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 

[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally

2014-11-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195468#comment-14195468
 ] 

Hudson commented on YARN-1922:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6432 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6432/])
YARN-1922. Fixed NodeManager to kill process-trees correctly in the presence of 
races between the launch and the stop-container call and when root processes 
crash. Contributed by Billie Rinaldi. (vinodkv: rev 
c5a46d4c8ca236ff641a309f256bbbdf4dd56db5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java


> Process group remains alive after container process is killed externally
> 
>
> Key: YARN-1922
> URL: https://issues.apache.org/jira/browse/YARN-1922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
> Environment: CentOS 6.4
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, 
> YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch
>
>
> If the main container process is killed externally, ContainerLaunch does not 
> kill the rest of the process group.  Before sending the event that results in 
> the ContainerLaunch.containerCleanup method being called, ContainerLaunch 
> sets the "completed" flag to true.  Then when cleaning up, it doesn't try to 
> read the pid file if the completed flag is true.  If it read the pid file, it 
> would proceed to send the container a kill signal.  In the case of the 
> DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195460#comment-14195460
 ] 

Wangda Tan commented on YARN-2505:
--

bq. -re 1 and 2, there are two kinds of consistency in play here - with the 
other node label apis and also with current apis in the web service. There are 
quite a few artifacts in the dao which are working with ids, including node, 
and they don't use "id" to specify it - I think it's assumed as there's no 
other way to refer to them in a web service context except via id. So, to stay 
consistent with the other web service apis, I don't think we should add "id" to 
the dao names.
I think I was wrong above, what you have is considering the string passed-in is 
nodeId, you will try to create a nodeId from that, that is what I expected. And 
yes, it is consistent with other methods of web service.

And another thing I can see is, you used ConverterUtils.toNodeId(..) which can 
only accept patterns like "host:port". I think we should also support user 
specify host only without port. Even though we assume port=0 is the whole host, 
but if user doesn't specify the port, we should treat it as port=0 instead of 
fail it.

And also, I will prefer to return host only when we want to return NodeId to 
user but the port=0. The port=0 magic is more like a implementation detail, we 
should avoid to expose it to user as we can.

Does this make sense to you?

Thanks,
Wangda

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally

2014-11-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195452#comment-14195452
 ] 

Vinod Kumar Vavilapalli commented on YARN-1922:
---

Sorry, didn't look at your previous comment given the progress on other patches.

So, I think we overall need to the following:
{code}
while (pidFile is not Present && the process has not crashed) {
  // loop
}
{code}
This is same as your do {} while {} loop.

+1 for your YARN-1922.5.patch. Checking this in.

> Process group remains alive after container process is killed externally
> 
>
> Key: YARN-1922
> URL: https://issues.apache.org/jira/browse/YARN-1922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
> Environment: CentOS 6.4
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, 
> YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch
>
>
> If the main container process is killed externally, ContainerLaunch does not 
> kill the rest of the process group.  Before sending the event that results in 
> the ContainerLaunch.containerCleanup method being called, ContainerLaunch 
> sets the "completed" flag to true.  Then when cleaning up, it doesn't try to 
> read the pid file if the completed flag is true.  If it read the pid file, it 
> would proceed to send the container a kill signal.  In the case of the 
> DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2198) Remove the need to run NodeManager as privileged account for Windows Secure Container Executor

2014-11-03 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195453#comment-14195453
 ] 

Chris Nauroth commented on YARN-2198:
-

It appears that this patch has broken some MR distributed cache functionality 
on Windows, or at least caused a failure in 
{{TestMRJobs#testDistributedCache}}.  Please see YARN-2803 for more details.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> --
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>  Labels: security, windows
> Fix For: 2.6.0
>
> Attachments: .YARN-2198.delta.10.patch, YARN-2198.1.patch, 
> YARN-2198.11.patch, YARN-2198.12.patch, YARN-2198.13.patch, 
> YARN-2198.14.patch, YARN-2198.15.patch, YARN-2198.16.patch, 
> YARN-2198.2.patch, YARN-2198.3.patch, YARN-2198.delta.4.patch, 
> YARN-2198.delta.5.patch, YARN-2198.delta.6.patch, YARN-2198.delta.7.patch, 
> YARN-2198.separation.patch, YARN-2198.trunk.10.patch, 
> YARN-2198.trunk.4.patch, YARN-2198.trunk.5.patch, YARN-2198.trunk.6.patch, 
> YARN-2198.trunk.8.patch, YARN-2198.trunk.9.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires the process launching the container to be LocalSystem or a 
> member of the a local Administrators group. Since the process in question is 
> the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-03 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195451#comment-14195451
 ] 

Chris Nauroth commented on YARN-2803:
-

Here is the stack trace from a failure.

{code}
testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time elapsed: 1
6.844 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs._testDistributedCache(TestMRJobs.java:881)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testDistributedCache(TestMRJobs.java:891)
{code}

The task log shows the assertion failing when it tries to find 
job.jar/lib/lib2.jar.

{code}
2014-11-03 15:36:33,652 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertNotNull(Assert.java:621)
at org.junit.Assert.assertNotNull(Assert.java:631)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs$DistributedCacheChecker.setup(TestMRJobs.java:764)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1640)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
{code}


> MR distributed cache not working correctly on Windows after NodeManager 
> privileged account changes.
> ---
>
> Key: YARN-2803
> URL: https://issues.apache.org/jira/browse/YARN-2803
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Nauroth
>Priority: Critical
>
> This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
> {{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running 
> git bisect, I traced it to the YARN-2198 patch to remove the need to run 
> NodeManager as a privileged account.  The tests started failing when that 
> patch was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2803) MR distributed cache not working correctly on Windows after NodeManager privileged account changes.

2014-11-03 Thread Chris Nauroth (JIRA)
Chris Nauroth created YARN-2803:
---

 Summary: MR distributed cache not working correctly on Windows 
after NodeManager privileged account changes.
 Key: YARN-2803
 URL: https://issues.apache.org/jira/browse/YARN-2803
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Chris Nauroth
Priority: Critical


This problem is visible by running {{TestMRJobs#testDistributedCache}} or 
{{TestUberAM#testDistributedCache}} on Windows.  Both tests fail.  Running git 
bisect, I traced it to the YARN-2198 patch to remove the need to run 
NodeManager as a privileged account.  The tests started failing when that patch 
was committed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195442#comment-14195442
 ] 

Hadoop QA commented on YARN-2079:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679073/YARN-2079.patch
  against trunk revision 35d353e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5711//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5711//console

This message is automatically generated.

> Recover NonAggregatingLogHandler state upon nodemanager restart
> ---
>
> Key: YARN-2079
> URL: https://issues.apache.org/jira/browse/YARN-2079
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2079.patch
>
>
> The state of NonAggregatingLogHandler needs to be persisted so logs are 
> properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-03 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2604:

Attachment: YARN-2604.patch

The new patch fixes the test failures:
- TestContainerAllocation: Minor adjustment to memory allocation amount
- TestFairScheduler: This failing test becomes obsolete with the patch, so I 
removed it
- TestCapacityScheduler: I had to use more fine-grained locking on 
{{maximumAllocation}} to fix this, so I gave it it's own 
{{ReentrantReadWriteLock}} instead of just using {{synchronized}}
- (TestAMRestart was unrelated)

> Scheduler should consider max-allocation-* in conjunction with the largest 
> node
> ---
>
> Key: YARN-2604
> URL: https://issues.apache.org/jira/browse/YARN-2604
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2604.patch, YARN-2604.patch
>
>
> If the scheduler max-allocation-* values are larger than the resources 
> available on the largest node in the cluster, an application requesting 
> resources between the two values will be accepted by the scheduler but the 
> requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195431#comment-14195431
 ] 

Craig Welch commented on YARN-2505:
---

[~leftnoteasy]

-re 1 and 2, there are two kinds of consistency in play here - with the other 
node label apis and also with current apis in the web service.  There are quite 
a few artifacts in the dao which are working with ids, including node, and they 
don't use "id" to specify it - I think it's assumed as there's no other way to 
refer to them in a web service context except via id.  So, to stay consistent 
with the other web service apis, I don't think we should add "id" to the dao 
names.

As far as the duplication of the put and delete operations on the cluster node 
labels I tend to agree, it seemed like there were too many ways to do that once 
the new api's were added, so I'll remove those.  I do think that the 
/nodes/nodeid/labels apis should stay (I believe you are saying the same thing 
there...) as those are useful for more easily/conveniently working with 
individual nodes.

Will post the updated patch in a few.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195430#comment-14195430
 ] 

Hadoop QA commented on YARN-2690:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679061/YARN-2690.004.patch
  against trunk revision 734eeb4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5708//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5708//console

This message is automatically generated.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, 
> YARN-2690.004.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195424#comment-14195424
 ] 

Wangda Tan commented on YARN-2800:
--

[~ozawa],
Store to levelDB or rockDB may be an option, I think it's worth to investigate 
the pros/cons if we move node labels store to it.
But I would against to put NodeLabelsSever as an independent process. There's 
one major difference between TimelineServer and NodeLabelsServer:

TimelineServer is a storage of historical data, majorly for retrieving. But the 
NodeLabelsManager is a center piece of scheduling. RM shouldn't be able to 
schedule if the "NodeLabelsServer" is gone, the scheduled resource is not 
expected.

In the near future, the scalability of NodeLabelsManager will not as large as 
worth to do that in a independent process, lots of synchronization between 
processes need to be handled, we should avoid such complexity until we can see 
the value of doing that :). 

Does this make sense to you?

Thanks,
Wangda

> Should print WARN log in both RM/RMAdminCLI side when 
> MemoryRMNodeLabelsManager is enabled
> --
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch
>
>
> Even though we have documented this, but it will be better to explicitly 
> print a message in both RM/RMAdminCLI side to explicitly say that the node 
> label being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195420#comment-14195420
 ] 

zhihai xu commented on YARN-2802:
-

The TestRMProxyUsersConf is passed in my  local build:
---
 T E S T S
---
Running org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 31.919 sec - in 
org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf

Results :

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

Also Findbugs warnings is not related to my changes: I didn't touch the file 
RMAppImpl.java in my patch.
Bug type REC_CATCH_EXCEPTION (click for details) 
In class 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition
In method 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl,
 RMAppEvent)
At RMAppImpl.java:[line 842]

Restart the Hadoop QA test.

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195421#comment-14195421
 ] 

Karthik Kambatla commented on YARN-2010:


The tests pass locally, the findbugs warning is to do with catching Exception 
instead of just IOException and InterruptedException in RMAppRecoveredTransition

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, 
> yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Attachment: YARN-2802.000.patch

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Attachment: (was: YARN-2802.000.patch)

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195407#comment-14195407
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

[~leftnoteasy], If we assume the labels as a configuration which can be highly 
updated, ZK is not good option as you mentioned. In this case, I think 
NodeLabelsManager, whose backend can be leveldb or rockdb, should be loosely 
coupling with RM like TimelineServer for stabilization of RM. One option is 
making NodeLabelsManager NodeLabelsServer.  It means RM should work correctly 
even if NodeLabelsManager is temporary unavailable. And update operation should 
only affect NodeLabelsManager(it doesn't affect RM). For example, RM pulls the 
label information from NodeLabelsServer periodically. RM treats the lable 
information as a hint and does schedule based on label information. Even 
without the information, RM should schedule apps. I think this weak consistency 
approach is suitable for large-scale updating.

> Should print WARN log in both RM/RMAdminCLI side when 
> MemoryRMNodeLabelsManager is enabled
> --
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch
>
>
> Even though we have documented this, but it will be better to explicitly 
> print a message in both RM/RMAdminCLI side to explicitly say that the node 
> label being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195406#comment-14195406
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12679065/YARN-2786-20141103-1-without-yarn.cmd.patch
  against trunk revision 35d353e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.TestResourceTrackerOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5710//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5710//console

This message is automatically generated.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195404#comment-14195404
 ] 

Hadoop QA commented on YARN-2802:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679054/YARN-2802.000.patch
  against trunk revision 734eeb4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5705//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5705//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5705//console

This message is automatically generated.

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195405#comment-14195405
 ] 

Hadoop QA commented on YARN-2010:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679057/yarn-2010-10.patch
  against trunk revision 734eeb4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMProxyUsersConf

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5707//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5707//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5707//console

This message is automatically generated.

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, 
> yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195394#comment-14195394
 ] 

Hadoop QA commented on YARN-2010:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679057/yarn-2010-10.patch
  against trunk revision 734eeb4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestNodesPage
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebApp
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication
  org.apache.hadoop.yarn.server.resourcemanager.TestAppManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5706//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5706//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5706//console

This message is automatically generated.

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, 
> yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
Th

[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195371#comment-14195371
 ] 

Wangda Tan commented on YARN-2505:
--

1) NodeToLabelsInfo should be NodeIdToLabelsInfo, since we should be able to 
specify nodeId in REST API to consistent with YarnClient APIs and RM admin CLI.
2) Also we need change name of NodeToLabelsInfo#getNodeToLabels to 
getNodeIdToLabels if you agree with #1
3) I would prefer drop REST APIs to modifcation single nodeId or nodeLabels 
like 
{code}
+  @DELETE
+  @Path("/node-labels/{nodeLabel}")
{code}
Also like: addLabelsToNode/removeLabelsFromNode, etc.
Since we have 
{code}
+  @POST
+  @Path("/node-labels/remove")
{code}
Already.
The reason are: single/batch operations seems a little duplicated to me, set a 
map of nodeId -> labels is not a big burden to end user, regarding both API 
complexity and performance.

However, we can keep get APIs for labels on a node:
{code}
+  @GET
+  @Path("/nodes/{nodeId}/labels")
{code}
Since it may not always needed to return all node-to-labels mappings regarding 
performance. 

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2014-11-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2079:
-
Attachment: YARN-2079.patch

Patch that saves the state of scheduled LogDeleterRunnable objects to the state 
store and reschedules them upon recovery.  Added unit tests for both the 
leveldb state store changes and log handler recovery.

> Recover NonAggregatingLogHandler state upon nodemanager restart
> ---
>
> Key: YARN-2079
> URL: https://issues.apache.org/jira/browse/YARN-2079
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2079.patch
>
>
> The state of NonAggregatingLogHandler needs to be persisted so logs are 
> properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2079) Recover NonAggregatingLogHandler state upon nodemanager restart

2014-11-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-2079:


Assignee: Jason Lowe

> Recover NonAggregatingLogHandler state upon nodemanager restart
> ---
>
> Key: YARN-2079
> URL: https://issues.apache.org/jira/browse/YARN-2079
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>
> The state of NonAggregatingLogHandler needs to be persisted so logs are 
> properly deleted across a nodemanager restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20141103-1-without-yarn.cmd.patch

Uploaded a patch without yarn.cmd to kick Jenkins

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch, YARN-2786-20141103-1-without-yarn.cmd.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195326#comment-14195326
 ] 

Hadoop QA commented on YARN-2786:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12679064/YARN-2786-20141103-1-full.patch
  against trunk revision 35d353e.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5709//console

This message is automatically generated.

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2786:
-
Attachment: YARN-2786-20141103-1-full.patch

> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch, 
> YARN-2786-20141103-1-full.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195317#comment-14195317
 ] 

Wangda Tan commented on YARN-2800:
--

[~ozawa],
Moving it to RMStateStore is not a bad idea, since node label store itself can 
be treated as a part of RM's state. However, the RMStateStore is hard coded can 
only use one storage behind, which I'm a little concern about. I'm not quite 
agree with ZK can handle it well, since we shouldn't assume this feature won't 
be used in a large cluster nor high frequency updating labels.

Node labels updating is different from RMStateStore updating, client side can 
change all labels of nodes (like 10k nodes) in one command, but there cannot be 
10k application completed in short period (say around seconds) at least for 
now. WAL based solution may be outperform in such scenario, I think ZK is not a 
good back end for WAL storage.

> Should print WARN log in both RM/RMAdminCLI side when 
> MemoryRMNodeLabelsManager is enabled
> --
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch
>
>
> Even though we have documented this, but it will be better to explicitly 
> print a message in both RM/RMAdminCLI side to explicitly say that the node 
> label being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-11-03 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1680:
--
Assignee: Craig Welch  (was: Chen He)

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Craig Welch
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-11-03 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195313#comment-14195313
 ] 

Chen He commented on YARN-1680:
---

Hi, [~cwelch], I just assigned to you. I am busy dealing moving stuff and may 
not have time to work on this for a short time.

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Craig Welch
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-11-03 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2690:

Attachment: YARN-2690.004.patch

Fixed javac warning. That was some preexisting code unchanged by the patch. 

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch, 
> YARN-2690.004.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.

2014-11-03 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195288#comment-14195288
 ] 

Craig Welch commented on YARN-1680:
---

Hey [~airbots] any luck on this?  If you're too busy to get to it, mind if I 
take it on?

> availableResources sent to applicationMaster in heartbeat should exclude 
> blacklistedNodes free memory.
> --
>
> Key: YARN-1680
> URL: https://issues.apache.org/jira/browse/YARN-1680
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
> Environment: SuSE 11 SP2 + Hadoop-2.3 
>Reporter: Rohith
>Assignee: Chen He
> Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, 
> YARN-1680-v2.patch, YARN-1680.patch
>
>
> There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster 
> slow start is set to 1.
> Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is 
> become unstable(3 Map got killed), MRAppMaster blacklisted unstable 
> NodeManager(NM-4). All reducer task are running in cluster now.
> MRAppMaster does not preempt the reducers because for Reducer preemption 
> calculation, headRoom is considering blacklisted nodes memory. This makes 
> jobs to hang forever(ResourceManager does not assing any new containers on 
> blacklisted nodes but returns availableResouce considers cluster free 
> memory). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195291#comment-14195291
 ] 

Hadoop QA commented on YARN-2505:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679036/YARN-2505.14.patch
  against trunk revision 237890f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5704//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5704//console

This message is automatically generated.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195277#comment-14195277
 ] 

Tsuyoshi OZAWA commented on YARN-2794:
--

Or, just put new HashMap via setSystemCrendentialsForApps.

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195273#comment-14195273
 ] 

Tsuyoshi OZAWA commented on YARN-2794:
--

[~jianhe], oops, I got it. The updating about systemCredentials is done only 
via setSystemCredentials. Then, your solution is enough. 
One minor nits: 
TestLogAggregationService#testAddNewTokenSentFromRMForLogAggregation calls   
{{this.context.getSystemCredentialsForApps().put(application1, credentials);}}. 
We should use ConcurrentHashMap for the test case.

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2010) Handle app-recovery failures gracefully

2014-11-03 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2010:
---
Attachment: yarn-2010-10.patch

Updated patch to address review comments. 

> Handle app-recovery failures gracefully
> ---
>
> Key: YARN-2010
> URL: https://issues.apache.org/jira/browse/YARN-2010
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: bc Wong
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: YARN-2010.1.patch, YARN-2010.patch, 
> issue-stacktrace.rtf, yarn-2010-10.patch, yarn-2010-2.patch, 
> yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, 
> yarn-2010-6.patch, yarn-2010-7.patch, yarn-2010-8.patch, yarn-2010-9.patch
>
>
> Sometimes, the RM fails to recover an application. It could be because of 
> turning security on, token expiry, or issues connecting to HDFS etc. The 
> causes could be classified into (1) transient, (2) specific to one 
> application, and (3) permanent and apply to multiple (all) applications. 
> Today, the RM fails to transition to Active and ends up in STOPPED state and 
> can never be transitioned to Active again.
> The initial stacktrace reported is at 
> https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195258#comment-14195258
 ] 

Tsuyoshi OZAWA commented on YARN-2794:
--

[~jianhe], thanks for taking this JIRA. Shouldn't we use ConcurrentHashMap? 
IIUC, making the variable volatile for this case is not enough to synchronize. 
Please correct me if I'm wrong.

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2802:

Attachment: YARN-2802.000.patch

> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> --
>
> Key: YARN-2802
> URL: https://issues.apache.org/jira/browse/YARN-2802
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
> Attachments: YARN-2802.000.patch
>
>
> add AM container launch and register delay metrics in QueueMetrics to help 
> diagnose performance issue.
> Added two metrics in QueueMetrics:
> aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
> to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
> aMRegisterDelay: the time waiting from receiving event 
> RMAppAttemptEventType.LAUNCHED to receiving event 
> RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
>  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-03 Thread zhihai xu (JIRA)
zhihai xu created YARN-2802:
---

 Summary: add AM container launch and register delay metrics in 
QueueMetrics to help diagnose performance issue.
 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu


add AM container launch and register delay metrics in QueueMetrics to help 
diagnose performance issue.
Added two metrics in QueueMetrics:
aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH to 
receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.

aMRegisterDelay: the time waiting from receiving event 
RMAppAttemptEventType.LAUNCHED to receiving event 
RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
 in RMAppAttemptImpl.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195220#comment-14195220
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

[~leftnoteasy], Thanks for your clarification. Essentially, the configurations 
about labels are a part of RM's state. IMHO, we should move the essential 
configuration onto RMStateStore to prevent the mismatch ideally. I think ZK can 
handle it since frequency of updating labels is not so high and number of 
labels are not so large. cc: [~jianhe], [~kkambatl], what do you think?

> Should print WARN log in both RM/RMAdminCLI side when 
> MemoryRMNodeLabelsManager is enabled
> --
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch
>
>
> Even though we have documented this, but it will be better to explicitly 
> print a message in both RM/RMAdminCLI side to explicitly say that the node 
> label being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195204#comment-14195204
 ] 

Wangda Tan commented on YARN-2786:
--

[~cwelch],
Thanks for comments,
bq. bin/yarn drop s
Addressed

bq. listLables should be listNodeLabels ...
Addressed

bq. Can't we use the visible for test annotation?
Addressed

bq. test is still using the node-labels command instead of cluster ...
Oh, my bad, I forgot to change that, the field will not be ignored in Java 
side, so the both of test case or actual using "yarn cluster ..." will be 
succeeded.

Will upload a patch soon.

Thanks,
Wangda


> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195174#comment-14195174
 ] 

Wangda Tan commented on YARN-2795:
--

Thanks Vinod's review and commit!

> Resource Manager fails startup with HDFS label storage and secure cluster
> -
>
> Key: YARN-2795
> URL: https://issues.apache.org/jira/browse/YARN-2795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Phil D'Amore
>Assignee: Wangda Tan
> Fix For: 2.6.0
>
> Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, 
> YARN-2795-20141102-2.patch
>
>
> When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to 
> a hdfs:// path, and the cluster is using kerberos, the RM fails to start 
> while trying to unmarshal the label store.  The following error/stack trace 
> is observed:
> {code}
> 2014-10-31 11:55:53,807 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service o
> rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state 
> INITED; cause: java.io.IOExcepti
> on: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate faile
> d [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tg
> t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination 
> hos
> t is: "host.running.nn":8020;
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: G
> SS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to fin
> d any Kerberos tgt)]; Host Details : local host is: 
> "host.running.rm/10.0.0.34"
> ; destination host is: "host.running.nn":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt
> ocolTranslatorPB.java:539)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187
> )
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731)
> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:206)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceInit(CommonNodeLabelsManager.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.serviceInit(RMNodeLabelsManager.java:62)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:547)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:986)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:245)
> at 
> org.apache.hadoop.service

[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster

2014-11-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195170#comment-14195170
 ] 

Hudson commented on YARN-2795:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6429 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6429/])
YARN-2795. Fixed ResourceManager to not crash loading node-label data from HDFS 
in secure mode. Contributed by Wangda Tan. (vinodkv: rev 
ec6cbece8e7772868ce8ad996135d3136bd32245)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


> Resource Manager fails startup with HDFS label storage and secure cluster
> -
>
> Key: YARN-2795
> URL: https://issues.apache.org/jira/browse/YARN-2795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Phil D'Amore
>Assignee: Wangda Tan
> Fix For: 2.6.0
>
> Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, 
> YARN-2795-20141102-2.patch
>
>
> When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to 
> a hdfs:// path, and the cluster is using kerberos, the RM fails to start 
> while trying to unmarshal the label store.  The following error/stack trace 
> is observed:
> {code}
> 2014-10-31 11:55:53,807 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service o
> rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state 
> INITED; cause: java.io.IOExcepti
> on: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate faile
> d [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tg
> t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination 
> hos
> t is: "host.running.nn":8020;
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: G
> SS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to fin
> d any Kerberos tgt)]; Host Details : local host is: 
> "host.running.rm/10.0.0.34"
> ; destination host is: "host.running.nn":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt
> ocolTranslatorPB.java:539)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187
> )
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731)
> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(C

[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195154#comment-14195154
 ] 

Wangda Tan commented on YARN-2800:
--

Hi [~ozawa],
bq. Let me clarify this case - do you mean RM will fail to allocate containers 
on labeled nodes after RM restart since RM uses MemoryRMNodeLabelsManager and 
forget the mapping of node-to-labels?
Not exactly, actually the RM will fail to start, because we have 
accessible-node-labels in queues, and when CS initialization, we will check if 
such labels existed in node labels manager. Upon mem-based RMNodelabelsManager 
and RM restart, CS cannot find labels from node labels manager, so RM will fail 
to start entirely.

I'm agree about what you mentioned about it may confuse people since admin may 
configured it properly in RM side, and it will be annoying every time run such 
command in client side. But I think it is still important to let the client 
know about this. Of course we can add it in RM web UI, but user may still not 
check it -- not all user will check cluster metrics UI :). So I think we can 
drop logging in RM admin CLI part and change the RMAdmin PB responses in a 
separated task, which will return the actual RMNodeLabelsManager being used in 
RM side. And we can log the WARN properly.

Do you have any other ideas?

Thanks,
Wangda

> Should print WARN log in both RM/RMAdminCLI side when 
> MemoryRMNodeLabelsManager is enabled
> --
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch
>
>
> Even though we have documented this, but it will be better to explicitly 
> print a message in both RM/RMAdminCLI side to explicitly say that the node 
> label being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster

2014-11-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195153#comment-14195153
 ] 

Vinod Kumar Vavilapalli commented on YARN-2795:
---

Tx for the update, Wangda.

Checking this in.

> Resource Manager fails startup with HDFS label storage and secure cluster
> -
>
> Key: YARN-2795
> URL: https://issues.apache.org/jira/browse/YARN-2795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Phil D'Amore
>Assignee: Wangda Tan
> Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, 
> YARN-2795-20141102-2.patch
>
>
> When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to 
> a hdfs:// path, and the cluster is using kerberos, the RM fails to start 
> while trying to unmarshal the label store.  The following error/stack trace 
> is observed:
> {code}
> 2014-10-31 11:55:53,807 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service o
> rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state 
> INITED; cause: java.io.IOExcepti
> on: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate faile
> d [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tg
> t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination 
> hos
> t is: "host.running.nn":8020;
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: G
> SS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to fin
> d any Kerberos tgt)]; Host Details : local host is: 
> "host.running.rm/10.0.0.34"
> ; destination host is: "host.running.nn":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt
> ocolTranslatorPB.java:539)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187
> )
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731)
> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:206)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceInit(CommonNodeLabelsManager.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.serviceInit(RMNodeLabelsManager.java:62)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:547)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:986)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:245)
> at 
> org.apache.hadoop.

[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.14.patch

TestFairScheduler passes on my box - and the change should not have any impact 
on it anyway - reuploading patch to trigger another go

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.14.patch, 
> YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, 
> YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.9.patch, 
> YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2800) Should print WARN log in both RM/RMAdminCLI side when MemoryRMNodeLabelsManager is enabled

2014-11-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195130#comment-14195130
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

[~leftnoteasy], thanks for your comments.

{quote}
+  + "this message is based on the yarn-site.xml settings "
+  + "in the machine you run \"yarn rmadmin ...\", if you "
+  + "already edited the field in yarn-site.xml of the node "
+  + "running RM, please ignore this message.";
{quote}

I think printing the message based on client-side configuration can confuse the 
user - it can be different from RM-side. Every users doesn't have a copy of 
RM-side configuration and some users doesn't know the contents of RM-side 
configuration.

{quote}
But if user configured mem-based node labels manager, user may add labels to 
queue configurations, when RM will be failed to launch (specifically, CS cannot 
initialize) if a queue use a label but not existed in node labels manager
{quote}

Let me clarify this case - do you mean RM will fail to allocate containers on 
labeled nodes after RM restart since RM uses MemoryRMNodeLabelsManager and 
forget the mapping of node-to-labels? In this case, I think we should arise the 
warnings to submitter of yarn apps like "application cannot be submitted for 
now since no node has the required label" after restart. It's more straight 
forward because users can notice the mistake of configurations of labels.

So I think it's better way to log the warning at startup once and add the 
information to Web UI for the consistency of the information. What do you think?

> Should print WARN log in both RM/RMAdminCLI side when 
> MemoryRMNodeLabelsManager is enabled
> --
>
> Key: YARN-2800
> URL: https://issues.apache.org/jira/browse/YARN-2800
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch
>
>
> Even though we have documented this, but it will be better to explicitly 
> print a message in both RM/RMAdminCLI side to explicitly say that the node 
> label being added will be lost across RM restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2786) Create yarn cluster CLI to enable list node labels collection

2014-11-03 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195131#comment-14195131
 ] 

Craig Welch commented on YARN-2786:
---

The "node" command isn't a good fit for this aspect of node-labels, as it is 
not an operation or query on nodes as such, but on the set of node labels 
recognized by the cluster.  If we don't want to tie it to the resource manager 
(not sure we can't, but it sounds as though we want to keep it distinct) then 
we need something new.  I actually preferred the original "node-labels" 
command, but "cluster" is ok if we believe that other things will come along in 
the future which fit this definition (and I could see that happen).

Code items:

bin/yarn
prints cluster informations - information is singular and plural, you can drop 
the s

ClusterCLI.java
listLables should be listNodeLabels (we've gone to that everywhere b/c there 
will likely be other kinds of labels, we should stay consistent, especially as 
"cluster" cmd name has lost any notion of "nodelabelness")

//Make it protected to make unit test can change it
Can't we use the visible for test annotation?

It looks like the test is still using the node-labels command instead of 
cluster, did something go wrong with the patch (maybe forgot to restage)?  Can 
you make sure the unit test + patch code are consistent and the tests pass?




> Create yarn cluster CLI to enable list node labels collection
> -
>
> Key: YARN-2786
> URL: https://issues.apache.org/jira/browse/YARN-2786
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2786-20141031-1.patch, YARN-2786-20141031-2.patch, 
> YARN-2786-20141102-2.patch, YARN-2786-20141102-3.patch
>
>
> With YARN-2778, we can list node labels on existing RM nodes. But it is not 
> enough, we should be able to: 
> 1) list node labels collection
> The command should start with "yarn cluster ...", in the future, we can add 
> more functionality to the "yarnClusterCLI"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-03 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2794:
--
Attachment: YARN-2794.patch

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2794) Fix log msgs about distributing system-credentials

2014-11-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195128#comment-14195128
 ] 

Jian He commented on YARN-2794:
---

Straight forward patch  to fix the logs to debug level.
bq. NMContext.systemCredentials will have concurrency issues
done

> Fix log msgs about distributing system-credentials 
> ---
>
> Key: YARN-2794
> URL: https://issues.apache.org/jira/browse/YARN-2794
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2788) yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0

2014-11-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195115#comment-14195115
 ] 

Hudson commented on YARN-2788:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6427 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6427/])
YARN-2788. Fixed backwards compatiblity issues with log-aggregation feature 
that were caused when adding log-upload-time via YARN-2703. Contributed by Xuan 
Gong. (vinodkv: rev 58e9f24e0f06efede21085b7ffe36af042fa7b38)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0
> --
>
> Key: YARN-2788
> URL: https://issues.apache.org/jira/browse/YARN-2788
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.6.0
>
> Attachments: YARN-2788.1.1.patch, YARN-2788.1.patch, 
> YARN-2788.2.patch, YARN-2788.3.patch, YARN-2788.4.patch, YARN-2788.5.patch
>
>
> Log format version needs to be upped between 2.4.0 and 2.6.0
> {code}
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:589)
> at java.lang.Long.parseLong(Long.java:631)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$ContainerLogsReader.nextLog(AggregatedLogFormat.java:765)
> at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.readContainerLogs(AggregatedLogsBlock.java:197)
> at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:166)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:178)
> ... 40 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display

2014-11-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195114#comment-14195114
 ] 

Hudson commented on YARN-2703:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6427 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6427/])
YARN-2788. Fixed backwards compatiblity issues with log-aggregation feature 
that were caused when adding log-upload-time via YARN-2703. Contributed by Xuan 
Gong. (vinodkv: rev 58e9f24e0f06efede21085b7ffe36af042fa7b38)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/LogCLIHelpers.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


> Add logUploadedTime into LogValue for better display
> 
>
> Key: YARN-2703
> URL: https://issues.apache.org/jira/browse/YARN-2703
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.6.0
>
> Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, 
> YARN-2703.4.patch
>
>
> Right now, the container can upload its logs multiple times. Sometimes, 
> containers write different logs into the same log file.  After the log 
> aggregation, when we query those logs, it will show:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> LogType: stderr
> LogContext:
> LogType: stdout
> LogContext:
> The same files could be displayed multiple times. But we can not figure out 
> which logs come first. We could add extra loguploadedTime to let users have 
> better understanding on the logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195101#comment-14195101
 ] 

Hadoop QA commented on YARN-2505:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679010/YARN-2505.13.patch
  against trunk revision 67f13b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5703//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5703//console

This message is automatically generated.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
> YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, 
> YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2788) yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0

2014-11-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195098#comment-14195098
 ] 

Vinod Kumar Vavilapalli commented on YARN-2788:
---

Looks good, +1. Checking this in.

> yarn logs -applicationId on 2.6.0 should support logs written by 2.4.0
> --
>
> Key: YARN-2788
> URL: https://issues.apache.org/jira/browse/YARN-2788
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 2.6.0
>Reporter: Gopal V
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-2788.1.1.patch, YARN-2788.1.patch, 
> YARN-2788.2.patch, YARN-2788.3.patch, YARN-2788.4.patch, YARN-2788.5.patch
>
>
> Log format version needs to be upped between 2.4.0 and 2.6.0
> {code}
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Long.parseLong(Long.java:589)
> at java.lang.Long.parseLong(Long.java:631)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$ContainerLogsReader.nextLog(AggregatedLogFormat.java:765)
> at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.readContainerLogs(AggregatedLogsBlock.java:197)
> at 
> org.apache.hadoop.yarn.webapp.log.AggregatedLogsBlock.render(AggregatedLogsBlock.java:166)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:67)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:77)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
> at 
> org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
> at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
> at 
> org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
> at 
> org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.logs(HsController.java:178)
> ... 40 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2795) Resource Manager fails startup with HDFS label storage and secure cluster

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195099#comment-14195099
 ] 

Wangda Tan commented on YARN-2795:
--

Just tried to test in a security enabled cluster, without this patch, RM will 
failed to start because we don't login before accessing HDFS.
And with this patch, RM can successfully start with labels stored on HDFS. And 
tried to submit a MR job after start, it can also successfully completed as 
well.

> Resource Manager fails startup with HDFS label storage and secure cluster
> -
>
> Key: YARN-2795
> URL: https://issues.apache.org/jira/browse/YARN-2795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Phil D'Amore
>Assignee: Wangda Tan
> Attachments: YARN-2795-20141101-1.patch, YARN-2795-20141102-1.patch, 
> YARN-2795-20141102-2.patch
>
>
> When node labels are in use, and yarn.node-labels.fs-store.root-dir is set to 
> a hdfs:// path, and the cluster is using kerberos, the RM fails to start 
> while trying to unmarshal the label store.  The following error/stack trace 
> is observed:
> {code}
> 2014-10-31 11:55:53,807 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(272)) - Service o
> rg.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in state 
> INITED; cause: java.io.IOExcepti
> on: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate faile
> d [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tg
> t)]; Host Details : local host is: "host.running.rm/10.0.0.34"; destination 
> hos
> t is: "host.running.nn":8020;
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: G
> SS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Failed to fin
> d any Kerberos tgt)]; Host Details : local host is: 
> "host.running.rm/10.0.0.34"
> ; destination host is: "host.running.nn":8020;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProt
> ocolTranslatorPB.java:539)
> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187
> )
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2731)
> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2702)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:870)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:866)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:859)
> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817)
> at 
> org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.init(FileSystemNodeLabelsStore.java:87)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:206)
> at 
> org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceInit(CommonNodeLabelsManager.java:199)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.serviceInit(RMNodeLabelsManager.java:62)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:547)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.reso

[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195069#comment-14195069
 ] 

Hadoop QA commented on YARN-2690:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12679005/YARN-2690.004.patch
  against trunk revision 67f13b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1267 javac 
compiler warnings (more than the trunk's current 1266 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5702//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5702//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5702//console

This message is automatically generated.

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195062#comment-14195062
 ] 

Hudson commented on YARN-2798:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6426 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6426/])
YARN-2798. Fixed YarnClient to populate the renewer correctly for Timeline 
delegation tokens. Contributed by Zhijie Shen. (vinodkv: rev 
71fbb474f531f60c5d908cf724f18f90dfd5fa9f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2730) DefaultContainerExecutor runs only one localizer at a time

2014-11-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195052#comment-14195052
 ] 

Hudson commented on YARN-2730:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6425 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6425/])
YARN-2730. DefaultContainerExecutor runs only one localizer at a time. 
Contributed by Siqi Li (jlowe: rev 6157ace5475fff8d2513fd3cd99134b532b0b406)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt


> DefaultContainerExecutor runs only one localizer at a time
> --
>
> Key: YARN-2730
> URL: https://issues.apache.org/jira/browse/YARN-2730
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2730.v1.patch, YARN-2730.v2.patch, 
> YARN-2730.v3.patch
>
>
> We are seeing that when one of the localizerRunner stuck, the rest of the 
> localizerRunners are blocked. We should remove the synchronized modifier.
> The synchronized modifier appears to have been added by 
> https://issues.apache.org/jira/browse/MAPREDUCE-3537
> It could be removed if Localizer doesn't depend on current directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195048#comment-14195048
 ] 

Hadoop QA commented on YARN-2505:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678993/YARN-2505.12.patch
  against trunk revision 67f13b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5701//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5701//console

This message is automatically generated.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
> YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, 
> YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2730) DefaultContainerExecutor runs only one localizer at a time

2014-11-03 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2730:
-
Summary: DefaultContainerExecutor runs only one localizer at a time  (was: 
Only one localizer can run on a NodeManager at a time)

+1 for the latest patch, committing this.

> DefaultContainerExecutor runs only one localizer at a time
> --
>
> Key: YARN-2730
> URL: https://issues.apache.org/jira/browse/YARN-2730
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2730.v1.patch, YARN-2730.v2.patch, 
> YARN-2730.v3.patch
>
>
> We are seeing that when one of the localizerRunner stuck, the rest of the 
> localizerRunners are blocked. We should remove the synchronized modifier.
> The synchronized modifier appears to have been added by 
> https://issues.apache.org/jira/browse/MAPREDUCE-3537
> It could be removed if Localizer doesn't depend on current directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195038#comment-14195038
 ] 

Vinod Kumar Vavilapalli commented on YARN-2798:
---

Looks good now, +1. Checking this in.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194982#comment-14194982
 ] 

Zhijie Shen edited comment on YARN-2798 at 11/3/14 8:03 PM:


I don't have a quick setup for RM HA and secure cluster, but the mapping rule 
is applied every where in this cluster, I think it should work fine.

In fact, this issue is not HA related problem. However, in general, if we want 
the DT renew to work across RMs, we have to run these RMs as the same operating 
user name. Otherwise, if DT renewer is set to yarn of RM1, and RM2 is run by 
yarn'. RM2 can no longer renew the DT. This is not applied just to timeline DT, 
but all the DTs that we assign RM to renew. Correct me if I'm wrong.


was (Author: zjshen):
I don't have a quick setup for RM HA and secure cluster, but the mapping rule 
is applied every where in this cluster, I think it should work fine.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-11-03 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2738:

Attachment: YARN-2738.002.patch

Thanks [~kasha] for the review

1. Removed TODO and opened YARN-2773.
2. Fixed

[~subru] do you see any issues where not having per queue configuration 
settings would make it difficult for some scenarios? I can see that the max and 
avg for CapacityOverTimePolicy might the first thing that people may need to 
configure per queue.
[~kasha] I would prefer either no queue configuration (other than the 
 element that marks a queue for reservations) or all instead of a 
partial set. Would you agree? 


> Add FairReservationSystem for FairScheduler
> ---
>
> Key: YARN-2738
> URL: https://issues.apache.org/jira/browse/YARN-2738
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2738.001.patch, YARN-2738.002.patch
>
>
> Need to create a FairReservationSystem that will implement ReservationSystem 
> for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.13.patch

Ok, implemented the changes we came up with (switching to /add /remove for 
cluster node label posts, changing post node-to-labels to post 
node-to-labels/replace) [~xgong] [~leftnoteasy] have a look pls.

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.13.patch, YARN-2505.3.patch, YARN-2505.4.patch, 
> YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, 
> YARN-2505.9.patch, YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194982#comment-14194982
 ] 

Zhijie Shen commented on YARN-2798:
---

I don't have a quick setup for RM HA and secure cluster, but the mapping rule 
is applied every where in this cluster, I think it should work fine.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-11-03 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2690:

Attachment: YARN-2690.004.patch

Fixed the javadoc warning because of typo

Re 1. The information returned is specific to Scheduler queues so named it that 
way. Also i am introducing another Reservation Configuration class which needs 
to be distinguished from this.
Fixed 2. and 3.
   

> Make ReservationSystem and its dependent classes independent of Scheduler 
> type  
> 
>
> Key: YARN-2690
> URL: https://issues.apache.org/jira/browse/YARN-2690
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
> YARN-2690.002.patch, YARN-2690.003.patch, YARN-2690.004.patch
>
>
> A lot of common reservation classes depend on CapacityScheduler and 
> specifically its configuration. This jira is to make them ready for other 
> Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1922) Process group remains alive after container process is killed externally

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194966#comment-14194966
 ] 

Hadoop QA commented on YARN-1922:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678987/YARN-1922.6.patch
  against trunk revision 67f13b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5700//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5700//console

This message is automatically generated.

> Process group remains alive after container process is killed externally
> 
>
> Key: YARN-1922
> URL: https://issues.apache.org/jira/browse/YARN-1922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
> Environment: CentOS 6.4
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, 
> YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch
>
>
> If the main container process is killed externally, ContainerLaunch does not 
> kill the rest of the process group.  Before sending the event that results in 
> the ContainerLaunch.containerCleanup method being called, ContainerLaunch 
> sets the "completed" flag to true.  Then when cleaning up, it doesn't try to 
> read the pid file if the completed flag is true.  If it read the pid file, it 
> would proceed to send the container a kill signal.  In the case of the 
> DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194963#comment-14194963
 ] 

Wangda Tan commented on YARN-2505:
--

Offline discussed with [~cwelch], some suggestions:
- We should have replaceLabelsOnNode as we have it in RMAdminCLI
- We should have removeFromClusterNodeLabels as we have addToClusterNodeLabels
- Suggest to make replaceLabelsOnNode URL as : .../node-to-labels/replace and 
using POST, in the future we can have ../node-to-labels/remove(add)
- Suggest to make remove/add To/From ClusterNodeLabels as : 
.../node-labels/add(remove) and using POST, to make it consistent with 
replace/add/removeLabelsOnNode APIs.

Thanks,
Wangda



> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2730) Only one localizer can run on a NodeManager at a time

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194947#comment-14194947
 ] 

Hadoop QA commented on YARN-2730:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678985/YARN-2730.v3.patch
  against trunk revision 67f13b5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5699//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5699//console

This message is automatically generated.

> Only one localizer can run on a NodeManager at a time
> -
>
> Key: YARN-2730
> URL: https://issues.apache.org/jira/browse/YARN-2730
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
> Attachments: YARN-2730.v1.patch, YARN-2730.v2.patch, 
> YARN-2730.v3.patch
>
>
> We are seeing that when one of the localizerRunner stuck, the rest of the 
> localizerRunners are blocked. We should remove the synchronized modifier.
> The synchronized modifier appears to have been added by 
> https://issues.apache.org/jira/browse/MAPREDUCE-3537
> It could be removed if Localizer doesn't depend on current directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2798) YarnClient doesn't need to translate Kerberos name of timeline DT renewer

2014-11-03 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194945#comment-14194945
 ] 

Jian He commented on YARN-2798:
---

Can you also check if it works for RM HA where two RMs sit on different host? I 
checked, it should work. as long as two RMs use the same mapping rule.

> YarnClient doesn't need to translate Kerberos name of timeline DT renewer
> -
>
> Key: YARN-2798
> URL: https://issues.apache.org/jira/browse/YARN-2798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Arpit Gupta
>Assignee: Zhijie Shen
>Priority: Blocker
> Attachments: YARN-2798.1.patch, YARN-2798.2.patch
>
>
> Now YarnClient will automatically get a timeline DT when submitting an app in 
> a secure mode. It will try to parse the yarn-site.xml/core-site.xml to get 
> the RM daemon operating system user. However, the RM principal and 
> auth_to_local may not be properly presented to the client, and the client 
> cannot translate the principal to the daemon user properly. On the other 
> hand, AbstractDelegationTokenIdentifier will do this translation when create 
> the token. However, since the client has already translated the full 
> principal into a short user name (which may not be correct), the server can 
> no longer apply the translation any more, where RM principal and 
> auth_to_local are always correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API

2014-11-03 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2505:
--
Attachment: YARN-2505.12.patch

At [~leftnoteasy] 's recommendation, switch the bulk operation for 
node-to-labels to a "replace" instead of "add", as this is what we plan to do 
from the cli.  [~xgong] can you have a look?  [~vinodkv] as well?

> Support get/add/remove/change labels in RM REST API
> ---
>
> Key: YARN-2505
> URL: https://issues.apache.org/jira/browse/YARN-2505
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Craig Welch
> Attachments: YARN-2505.1.patch, YARN-2505.11.patch, 
> YARN-2505.12.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, 
> YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, 
> YARN-2505.9.patch, YARN-2505.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2735) diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are initialized twice in DirectoryCollection

2014-11-03 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194893#comment-14194893
 ] 

Anubhav Dhoot commented on YARN-2735:
-

This looks like a trivial patch that should be okay without tests.
LGTM

> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection
> ---
>
> Key: YARN-2735
> URL: https://issues.apache.org/jira/browse/YARN-2735
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Minor
> Attachments: YARN-2735.000.patch
>
>
> diskUtilizationPercentageCutoff and diskUtilizationSpaceCutoff are 
> initialized twice in DirectoryCollection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1922) Process group remains alive after container process is killed externally

2014-11-03 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1922:
-
Attachment: YARN-1922.6.patch

Attaching a new patch.  Instead of using do/while (!completed.get()), this 
patch simply uses while(true), so that it always loops until the pid file 
appears or the maxKillWaitTime elapses.  [~vinodkv], does this address your 
concerns?

> Process group remains alive after container process is killed externally
> 
>
> Key: YARN-1922
> URL: https://issues.apache.org/jira/browse/YARN-1922
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.0
> Environment: CentOS 6.4
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-1922.1.patch, YARN-1922.2.patch, YARN-1922.3.patch, 
> YARN-1922.4.patch, YARN-1922.5.patch, YARN-1922.6.patch
>
>
> If the main container process is killed externally, ContainerLaunch does not 
> kill the rest of the process group.  Before sending the event that results in 
> the ContainerLaunch.containerCleanup method being called, ContainerLaunch 
> sets the "completed" flag to true.  Then when cleaning up, it doesn't try to 
> read the pid file if the completed flag is true.  If it read the pid file, it 
> would proceed to send the container a kill signal.  In the case of the 
> DefaultContainerExecutor, this would kill the process group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >