[jira] [Commented] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread bc Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922132#comment-13922132
 ] 

bc Wong commented on YARN-1790:
---

Seems that the fix of YARN-1407 forgot to change the FairSchedulerAppsBlock to 
use the user-facing app state.

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: fs_ui.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1774) FS: Submitting to non-leaf queue throws NPE

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922175#comment-13922175
 ] 

Hadoop QA commented on YARN-1774:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633063/yarn-1774-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3274//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3274//console

This message is automatically generated.

 FS: Submitting to non-leaf queue throws NPE
 ---

 Key: YARN-1774
 URL: https://issues.apache.org/jira/browse/YARN-1774
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Blocker
 Attachments: YARN-1774.patch, yarn-1774-2.patch


 If you create a hierarchy of queues and assign a job to parent queue, 
 FairScheduler quits with a NPE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1685) Bugs around log URL

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922177#comment-13922177
 ] 

Hadoop QA commented on YARN-1685:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633060/YARN-1685.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3273//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3273//console

This message is automatically generated.

 Bugs around log URL
 ---

 Key: YARN-1685
 URL: https://issues.apache.org/jira/browse/YARN-1685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen
 Attachments: YARN-1685-1.patch, YARN-1685.2.patch, YARN-1685.3.patch


 1. Log URL should be different when the container is running and finished
 2. Null case needs to be handled
 3. The way of constructing log URL should be corrected



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread bc Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bc Wong updated YARN-1790:
--

Attachment: fs_ui_fixed.png
0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch

Trivial fix. Also ported YARN-563 to FairScheduler UI. Tested manually (see 
screenshot).

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: 
 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch, fs_ui.png, 
 fs_ui_fixed.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1788) AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn application -kill

2014-03-06 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev reassigned YARN-1788:
---

Assignee: Varun Vasudev

 AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn 
 application -kill
 --

 Key: YARN-1788
 URL: https://issues.apache.org/jira/browse/YARN-1788
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Varun Vasudev

 Run MR sleep job. Kill the application in RUNNING state. Observe RM metrics.
 Expecting AppsCompleted = 0/AppsKilled = 1
 Actual is AppsCompleted = 1/AppsKilled = 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1788) AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn application -kill

2014-03-06 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1788:


Attachment: apache-yarn-1788.0.patch

The fix is to use the finalState instead of the getState() function when 
dispatch the AppRemovedSchedulerEvent. The patch file has the fix in the 
FinalTransition class in RMAppImpl and added tests.

 AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn 
 application -kill
 --

 Key: YARN-1788
 URL: https://issues.apache.org/jira/browse/YARN-1788
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Varun Vasudev
 Attachments: apache-yarn-1788.0.patch


 Run MR sleep job. Kill the application in RUNNING state. Observe RM metrics.
 Expecting AppsCompleted = 0/AppsKilled = 1
 Actual is AppsCompleted = 1/AppsKilled = 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1791) Distributed cache issue using YARN

2014-03-06 Thread Ashish Kumar (JIRA)
Ashish Kumar created YARN-1791:
--

 Summary: Distributed cache issue using YARN
 Key: YARN-1791
 URL: https://issues.apache.org/jira/browse/YARN-1791
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashish Kumar


If I want to have two cache files a/b/c and d/e/c for an MR job then is there 
any way to access Path of these files while reading it in Map or Reduce task?
I'm using *job.addCacheFile(hdfsPath.toUri());* And then I'm accessing all 
cache file paths using *context.getLocalCacheFiles()* which returns all paths 
as given below:

/yarn/?/?/?/1234/c and /yarn/?/?/?/2345/c

But these paths don't have any folder level info so I'm not able to identify 
which path is representing a/b/c. Is it bug?
Please help.

Thanks,
Ashish



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922332#comment-13922332
 ] 

Hadoop QA commented on YARN-1790:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633086/fs_ui_fixed.png
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3275//console

This message is automatically generated.

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: 
 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch, fs_ui.png, 
 fs_ui_fixed.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922344#comment-13922344
 ] 

Hudson commented on YARN-1752:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1693 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1693/])
YARN-1752. Fixed ApplicationMasterService to reject unregister request if AM 
did not register before. Contributed by Rohith Sharma. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574623)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidApplicationMasterRequestException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch, YARN-1752.5.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922345#comment-13922345
 ] 

Hudson commented on YARN-1761:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1693 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1693/])
YARN-1761. Modified RMAdmin CLI to check whether HA is enabled or not before it 
executes any of the HA admin related commands. Contributed by Xuan Gong. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574661)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch, 
 YARN-1761.3.patch, YARN-1761.3.patch, YARN-1761.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1785) FairScheduler treats app lookup failures as ERRORs

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922351#comment-13922351
 ] 

Hudson commented on YARN-1785:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1693 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1693/])
YARN-1785. FairScheduler treats app lookup failures as ERRORs. (bc Wong via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574604)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler treats app lookup failures as ERRORs
 --

 Key: YARN-1785
 URL: https://issues.apache.org/jira/browse/YARN-1785
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Fix For: 2.4.0

 Attachments: 
 0001-YARN-1785.-FairScheduler-treats-app-lookup-failures-.patch


 When invoking the /ws/v1/cluster/apps endpoint, RM will eventually get to 
 RMAppImpl#createAndGetApplicationReport, which calls 
 RMAppAttemptImpl#getApplicationResourceUsageReport, which looks up the app in 
 the scheduler, which may or may not exist. So FairScheduler shouldn't log an 
 error for every lookup failure:
 {noformat}
 2014-02-17 08:23:21,240 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Request for appInfo of unknown attemptappattempt_1392419715319_0135_01
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-06 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1781:


Attachment: apache-yarn-1781.2.patch

Patch with code review comments incorporated.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, 
 apache-yarn-1781.2.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1788) AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn application -kill

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922390#comment-13922390
 ] 

Hadoop QA commented on YARN-1788:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633091/apache-yarn-1788.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3276//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3276//console

This message is automatically generated.

 AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn 
 application -kill
 --

 Key: YARN-1788
 URL: https://issues.apache.org/jira/browse/YARN-1788
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Varun Vasudev
 Attachments: apache-yarn-1788.0.patch


 Run MR sleep job. Kill the application in RUNNING state. Observe RM metrics.
 Expecting AppsCompleted = 0/AppsKilled = 1
 Actual is AppsCompleted = 1/AppsKilled = 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922400#comment-13922400
 ] 

Hudson commented on YARN-1761:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #501 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/501/])
YARN-1761. Modified RMAdmin CLI to check whether HA is enabled or not before it 
executes any of the HA admin related commands. Contributed by Xuan Gong. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574661)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch, 
 YARN-1761.3.patch, YARN-1761.3.patch, YARN-1761.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922399#comment-13922399
 ] 

Hudson commented on YARN-1752:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #501 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/501/])
YARN-1752. Fixed ApplicationMasterService to reject unregister request if AM 
did not register before. Contributed by Rohith Sharma. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574623)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidApplicationMasterRequestException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch, YARN-1752.5.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1785) FairScheduler treats app lookup failures as ERRORs

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922406#comment-13922406
 ] 

Hudson commented on YARN-1785:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #501 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/501/])
YARN-1785. FairScheduler treats app lookup failures as ERRORs. (bc Wong via 
kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574604)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


 FairScheduler treats app lookup failures as ERRORs
 --

 Key: YARN-1785
 URL: https://issues.apache.org/jira/browse/YARN-1785
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Fix For: 2.4.0

 Attachments: 
 0001-YARN-1785.-FairScheduler-treats-app-lookup-failures-.patch


 When invoking the /ws/v1/cluster/apps endpoint, RM will eventually get to 
 RMAppImpl#createAndGetApplicationReport, which calls 
 RMAppAttemptImpl#getApplicationResourceUsageReport, which looks up the app in 
 the scheduler, which may or may not exist. So FairScheduler shouldn't log an 
 error for every lookup failure:
 {noformat}
 2014-02-17 08:23:21,240 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Request for appInfo of unknown attemptappattempt_1392419715319_0135_01
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922457#comment-13922457
 ] 

Hadoop QA commented on YARN-1781:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633107/apache-yarn-1781.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3277//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3277//console

This message is automatically generated.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, 
 apache-yarn-1781.2.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1791) Distributed cache issue using YARN

2014-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved YARN-1791.
--

Resolution: Invalid

The distributed cache only preserves the basename of files and links them into 
the containers working directory.  If two names collide one can use the URI 
fragment to provide an alternative name for the symlink.  For example, 
hdfs:/a/b/c#d will be seen as d in the container working directory rather 
than c.  If you require paths to be preserved then you can specify an archive 
(e.g.: .tar.gz, .zip, etc.) which will be expanded when localized and paths can 
exist within that.

In the future please use [mailto:u...@hadoop.apache.org] for asking questions.  
Apache JIRA is for reporting bugs and tracking features/improvements.

 Distributed cache issue using YARN
 --

 Key: YARN-1791
 URL: https://issues.apache.org/jira/browse/YARN-1791
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ashish Kumar

 If I want to have two cache files a/b/c and d/e/c for an MR job then is there 
 any way to access Path of these files while reading it in Map or Reduce task?
 I'm using *job.addCacheFile(hdfsPath.toUri());* And then I'm accessing all 
 cache file paths using *context.getLocalCacheFiles()* which returns all paths 
 as given below:
 /yarn/?/?/?/1234/c and /yarn/?/?/?/2345/c
 But these paths don't have any folder level info so I'm not able to identify 
 which path is representing a/b/c. Is it bug?
 Please help.
 Thanks,
 Ashish



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1761) RMAdminCLI should check whether HA is enabled before executes transitionToActive/transitionToStandby

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922590#comment-13922590
 ] 

Hudson commented on YARN-1761:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1718 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1718/])
YARN-1761. Modified RMAdmin CLI to check whether HA is enabled or not before it 
executes any of the HA admin related commands. Contributed by Xuan Gong. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574661)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 RMAdminCLI should check whether HA is enabled before executes 
 transitionToActive/transitionToStandby
 

 Key: YARN-1761
 URL: https://issues.apache.org/jira/browse/YARN-1761
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.4.0

 Attachments: YARN-1761.1.patch, YARN-1761.2.patch, YARN-1761.2.patch, 
 YARN-1761.3.patch, YARN-1761.3.patch, YARN-1761.4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922589#comment-13922589
 ] 

Hudson commented on YARN-1752:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1718 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1718/])
YARN-1752. Fixed ApplicationMasterService to reject unregister request if AM 
did not register before. Contributed by Rohith Sharma. (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1574623)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/InvalidApplicationMasterRequestException.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Unexpected Unregistered event at Attempt Launched state
 ---

 Key: YARN-1752
 URL: https://issues.apache.org/jira/browse/YARN-1752
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Rohith
 Fix For: 2.4.0

 Attachments: YARN-1752.1.patch, YARN-1752.2.patch, YARN-1752.3.patch, 
 YARN-1752.4.patch, YARN-1752.5.patch


 {code}
 2014-02-21 14:56:03,453 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 UNREGISTERED at LAUNCHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:647)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:733)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:695)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1410) Handle RM fails over after getApplicationID() and before submitApplication().

2014-03-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1410:


Attachment: YARN-1410.10.patch

 Handle RM fails over after getApplicationID() and before submitApplication().
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.10.patch, YARN-1410.2.patch, YARN-1410.2.patch, YARN-1410.3.patch, 
 YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, YARN-1410.7.patch, 
 YARN-1410.8.patch, YARN-1410.9.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1410) Handle RM fails over after getApplicationID() and before submitApplication().

2014-03-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1410:


Attachment: YARN-1410.10.patch

 Handle RM fails over after getApplicationID() and before submitApplication().
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.10.patch, YARN-1410.10.patch, YARN-1410.2.patch, YARN-1410.2.patch, 
 YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, 
 YARN-1410.7.patch, YARN-1410.8.patch, YARN-1410.9.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1410) Handle RM fails over after getApplicationID() and before submitApplication().

2014-03-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922832#comment-13922832
 ] 

Xuan Gong commented on YARN-1410:
-

create a new patch to address all the comments, which includes:
1. create a new Exception: ApplicationIdNotProvidedException. Before we do the 
application submission, we will check the applicationId. If the applicationId 
is not provided in ApplicationSubmissionContext, it will throw the 
ApplicationIdNotProvidedException. That requires the client should provide the 
ApplicationId before submission.

2. Added documentations in several places : GetNewApplicationResponse 
(explicitly saying the applicationId can be used to submit the application), 
YarnClient#submitApplication.
* SubmitApplicationResponse: nothing change here. So, no new documentation 
added for this class
* ApplicationClientProtocol.getNewApplication(..) API and 
ApplicationClientProtocol.submitApplication(..): no new documentation added. 
The current documents have enough information about that clients need to do 
when we return a appID.

3. Modify the testcase:
* modify the TestYarnClient#testSubmitApplication() to validate we should get 
ApplicationIdNotProvidedException if applicationId is not provided
* add a new test: TestSubmitApplicationWithRMHA to test handle RM fails over 
before submitApplication()

 Handle RM fails over after getApplicationID() and before submitApplication().
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.10.patch, YARN-1410.10.patch, YARN-1410.2.patch, YARN-1410.2.patch, 
 YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, 
 YARN-1410.7.patch, YARN-1410.8.patch, YARN-1410.9.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread bc Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bc Wong updated YARN-1790:
--

Attachment: (was: 
0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch)

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: 
 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch, fs_ui.png, 
 fs_ui_fixed.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread bc Wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bc Wong updated YARN-1790:
--

Attachment: 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch

Same patch with --no-prefix.

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: 
 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch, fs_ui.png, 
 fs_ui_fixed.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-03-06 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922906#comment-13922906
 ] 

Chris Trezzo commented on YARN-1492:


bq. Do you want to go ahead and create sub-tasks? 

Will do. We have already made significant progress on implementation 
internally, so we should have a number of patches posted shortly.

 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
 shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
 shared_cache_design_v5.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922920#comment-13922920
 ] 

Xuan Gong commented on YARN-1783:
-

The logic to handle NodeAction.RESYNC looks good to me. But there will be one 
more issue.  It is very possible that there is one container whose state is not 
completed when we generate NodeStatus and send to RM, but after we receive the 
response, the state of this container become COMPLETE. In this patch, we will 
remove all the completed containers. In this case, we will remove this 
container from context, and this container’s status will be missed.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch, YARN-1783.2.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1466) implement the cleaner service for the shared cache

2014-03-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-1466.
---

Resolution: Invalid

I'll close out these JIRAs for YARN-1492, as the design has changed from the 
time these JIRAs were filed.

 implement the cleaner service for the shared cache
 --

 Key: YARN-1466
 URL: https://issues.apache.org/jira/browse/YARN-1466
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1467) implement checksum verification for resource localization service for the shared cache

2014-03-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee resolved YARN-1467.
---

Resolution: Invalid

I'll close out these JIRAs for YARN-1492, as the design has changed from the 
time these JIRAs were filed.

 implement checksum verification for resource localization service for the 
 shared cache
 --

 Key: YARN-1467
 URL: https://issues.apache.org/jira/browse/YARN-1467
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Sangjin Lee
Assignee: Sangjin Lee





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1788) AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn application -kill

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1788:
--

Priority: Critical  (was: Major)
Target Version/s: 2.4.0

Tx for the patch Varun! Marking it for 2.4 as it seems like a bad bug.

The patch looks fine overall. But the test isn't useful much. 
TestRMAppTransitions and similar tests are basic unit-tests that don't uncover 
a lot of bugs that happen during integration. You should imitate 
TestRMRestart.testQueueMetricsOnRMRestart() without the restart part- that 
should be a fine balance between a unit test, integration test and a real life 
setup of starting clusters.

 AppsCompleted/AppsKilled metric is incorrect when MR job is killed with yarn 
 application -kill
 --

 Key: YARN-1788
 URL: https://issues.apache.org/jira/browse/YARN-1788
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Tassapol Athiapinya
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-1788.0.patch


 Run MR sleep job. Kill the application in RUNNING state. Observe RM metrics.
 Expecting AppsCompleted = 0/AppsKilled = 1
 Actual is AppsCompleted = 1/AppsKilled = 0



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1792) Add a CLI to kill yarn container

2014-03-06 Thread Tassapol Athiapinya (JIRA)
Tassapol Athiapinya created YARN-1792:
-

 Summary: Add a CLI to kill yarn container
 Key: YARN-1792
 URL: https://issues.apache.org/jira/browse/YARN-1792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya


One of my teammates saw an issue when there is dangling container. The reason 
could have been because of a bug in YARN application or unexpected environment 
failure. It is nice if YARN can handle this from YARN framework. I suggest YARN 
to provide a CLI to kill container(s).

Security should be obeyed. In first phase, we could allow only YARN admin to 
kill container(s). 

The method should also work in both Linux and Windows platform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-06 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1783:
--

Attachment: YARN-1783.3.patch

Thanks for catching this !
The new patch creates a separate collection for recording the previous 
completed containers when getNodeStatus is called and remove containers from 
context only for those completed containers.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch, YARN-1783.2.patch, YARN-1783.3.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1792) Add a CLI to kill yarn container

2014-03-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1792:
---

Assignee: Xuan Gong

 Add a CLI to kill yarn container
 

 Key: YARN-1792
 URL: https://issues.apache.org/jira/browse/YARN-1792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong

 One of my teammates saw an issue when there is dangling container. The reason 
 could have been because of a bug in YARN application or unexpected 
 environment failure. It is nice if YARN can handle this from YARN framework. 
 I suggest YARN to provide a CLI to kill container(s).
 Security should be obeyed. In first phase, we could allow only YARN admin to 
 kill container(s). 
 The method should also work in both Linux and Windows platform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1792) Add a CLI to kill yarn container

2014-03-06 Thread Tassapol Athiapinya (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya updated YARN-1792:
--

Description: 
One of my teammates saw an issue when there was dangling container. The reason 
could have been because of a bug in YARN application or unexpected environment 
failure. It is nice if YARN can handle this from YARN framework. I suggest YARN 
to provide a CLI to kill container(s).

Security should be obeyed. In first phase, we could allow only YARN admin to 
kill container(s). 

The method should also work in both Linux and Windows platform.

  was:
One of my teammates saw an issue when there is dangling container. The reason 
could have been because of a bug in YARN application or unexpected environment 
failure. It is nice if YARN can handle this from YARN framework. I suggest YARN 
to provide a CLI to kill container(s).

Security should be obeyed. In first phase, we could allow only YARN admin to 
kill container(s). 

The method should also work in both Linux and Windows platform.


 Add a CLI to kill yarn container
 

 Key: YARN-1792
 URL: https://issues.apache.org/jira/browse/YARN-1792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong

 One of my teammates saw an issue when there was dangling container. The 
 reason could have been because of a bug in YARN application or unexpected 
 environment failure. It is nice if YARN can handle this from YARN framework. 
 I suggest YARN to provide a CLI to kill container(s).
 Security should be obeyed. In first phase, we could allow only YARN admin to 
 kill container(s). 
 The method should also work in both Linux and Windows platform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1792) Add a CLI to kill yarn container

2014-03-06 Thread Ramya Sunil (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922970#comment-13922970
 ] 

Ramya Sunil commented on YARN-1792:
---

Duplicate of YARN-1619

 Add a CLI to kill yarn container
 

 Key: YARN-1792
 URL: https://issues.apache.org/jira/browse/YARN-1792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong

 One of my teammates saw an issue when there was dangling container. The 
 reason could have been because of a bug in YARN application or unexpected 
 environment failure. It is nice if YARN can handle this from YARN framework. 
 I suggest YARN to provide a CLI to kill container(s).
 Security should be obeyed. In first phase, we could allow only YARN admin to 
 kill container(s). 
 The method should also work in both Linux and Windows platform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1793:
--

 Summary: yarn application -kill doesn't kill UnmanagedAMs
 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical


Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs a 
success, but doesn't actually kill the AM or reclaim the containers allocated 
to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922989#comment-13922989
 ] 

Karthik Kambatla commented on YARN-1793:


{code}
if (application.isAppSafeToTerminate()) {
  RMAuditLogger.logSuccess(callerUGI.getShortUserName(),
AuditConstants.KILL_APP_REQUEST, ClientRMService, applicationId);
  return KillApplicationResponse.newInstance(true);
} else {
  this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.KILL));
  return KillApplicationResponse.newInstance(false);
}
{code}
Looks like we don't do anything bug log and return if the app is unmanaged. If 
the AM continues to run, it continues to hold onto all the containers that were 
allocated to it. 

[~jianhe], [~vinodkv], [~bikassaha] - any thoughts off the top of your head? 

 yarn application -kill doesn't kill UnmanagedAMs
 

 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical

 Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs 
 a success, but doesn't actually kill the AM or reclaim the containers 
 allocated to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922990#comment-13922990
 ] 

Hadoop QA commented on YARN-1790:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633204/0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3279//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3279//console

This message is automatically generated.

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: 
 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch, fs_ui.png, 
 fs_ui_fixed.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1685) Bugs around log URL

2014-03-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1685:
--

Attachment: YARN-1685.4.patch

Upload a new patch:

1. Move the logic of constructing the log URL pointing the current timeline 
server in ApplicationHistoryManagerImpl, which ensures the log URL correctness 
in RPC interface as well, and simplifies the changes.

2. Add the test case to verify the logURL delivered from RPC interface.

 Bugs around log URL
 ---

 Key: YARN-1685
 URL: https://issues.apache.org/jira/browse/YARN-1685
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Zhijie Shen
 Attachments: YARN-1685-1.patch, YARN-1685.2.patch, YARN-1685.3.patch, 
 YARN-1685.4.patch


 1. Log URL should be different when the container is running and finished
 2. Null case needs to be handled
 3. The way of constructing log URL should be corrected



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923039#comment-13923039
 ] 

Karthik Kambatla commented on YARN-1525:


bq. In the case the standby RM, where isStandbyMode() returns true and we start 
to buildRedirectPath, switches to be active, keeping the variable 
RMWebApp#standbyMode allows the behavior to be consistent from the time we 
tested isStandbyMode() first in RMDispatcher.service().
I see. This can be a little confusing. Can we change {{boolean isStandbyMode}} 
to {{void checkStandbyMode}} and set the field standbyMode. We can then may be 
access this field directly or through an accessor.

bq.  I've been using a temporary configuration (which you can find from 
RMWebApp.getRedirectPath()). I was actually resetting RMid back in the original 
code. It was necessary if I was not using a temporary configuration. But I'll 
remove them since I've been using temporary configuration.
Now I see why it was working fine. Still, I think we should also fix 
RMHAUtils#findActiveRMHAId to use a copy and not mutate the conf that is passed 
to it, as this is a util method and will likely be used at other places. 

We should also get rid of getting and setting rm-id in 
RMWebApp#buildRedirectPath.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.v7.patch, YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1792) Add a CLI to kill yarn container

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1792.
---

Resolution: Duplicate

Closing as duplicate.

 Add a CLI to kill yarn container
 

 Key: YARN-1792
 URL: https://issues.apache.org/jira/browse/YARN-1792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.4.0
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong

 One of my teammates saw an issue when there was dangling container. The 
 reason could have been because of a bug in YARN application or unexpected 
 environment failure. It is nice if YARN can handle this from YARN framework. 
 I suggest YARN to provide a CLI to kill container(s).
 Security should be obeyed. In first phase, we could allow only YARN admin to 
 kill container(s). 
 The method should also work in both Linux and Windows platform.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1619) Add cli to kill yarn container

2014-03-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1619:
---

Assignee: Xuan Gong

 Add cli to kill yarn container
 --

 Key: YARN-1619
 URL: https://issues.apache.org/jira/browse/YARN-1619
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.4.0


 It will be useful to have a generic cli tool to kill containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1783) yarn application does not make any progress even when no other application is running when RM is being restarted in the background

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923073#comment-13923073
 ] 

Hadoop QA commented on YARN-1783:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633212/YARN-1783.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3280//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3280//console

This message is automatically generated.

 yarn application does not make any progress even when no other application is 
 running when RM is being restarted in the background
 --

 Key: YARN-1783
 URL: https://issues.apache.org/jira/browse/YARN-1783
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Critical
 Attachments: YARN-1783.1.patch, YARN-1783.2.patch, YARN-1783.3.patch


 Noticed that during HA tests some tests took over 3 hours to run when the 
 test failed.
 Looking at the logs i see the application made no progress for a very long 
 time. However if i look at application log from yarn it actually ran in 5 mins
 I am seeing same behavior when RM was being restarted in the background and 
 when both RM and AM were being restarted. This does not happen for all 
 applications but a few will hit this in the nightly run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923099#comment-13923099
 ] 

Karthik Kambatla commented on YARN-1793:


What do you think about getting rid of this if-else altogether and create the 
new RMAppEvent for kill in both cases? 

 yarn application -kill doesn't kill UnmanagedAMs
 

 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical

 Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs 
 a success, but doesn't actually kill the AM or reclaim the containers 
 allocated to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1793:
---

Attachment: yarn-1793-0.patch

Simple patch that seems to fix the issue. Removed UnmanagedAM from the 
isAppSafeToTerminate check - there are only two uses of this method and looks 
like we want to treat UnmanagedAMs differently in both places. 

Looking for early feedback if this is an acceptable approach. In a way, this is 
similar to what ApplicationMasterService#finishApplicationMaster.

TODO:
* Rename isAppSafeToTerminate - don't think it conveys what it is intend to do.
* Simplify ApplicationMasterService#finishApplicationMaster. We seem to be 
doing the same for both managed and unmanaged AMs. The method can use some 
simplification. 
* Unit tests where possible. 

 yarn application -kill doesn't kill UnmanagedAMs
 

 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1793-0.patch


 Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs 
 a success, but doesn't actually kill the AM or reclaim the containers 
 allocated to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923176#comment-13923176
 ] 

Sangjin Lee commented on YARN-1771:
---

I have been looking into this from the perspective of reducing the number of 
unnecessary getFileStatus calls (and thereby reducing the pressure on the name 
node). So for now I'm gravitating towards a solution that caches the 
getFileStatus calls for the duration of a container initialization (i.e. 
resource localization). It would be pretty effective (reducing the number of 
calls from (m + 3)*n to n + (small constant)).

I'll upload a patch for your review shortly. Thanks!

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical

 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1771:
--

Attachment: yarn-1771.patch

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923226#comment-13923226
 ] 

Sangjin Lee commented on YARN-1771:
---

I have created a status cache at the LocalizerContext level, and let FSDownload 
utilize the cache when querying the file status for the parent directories.

I considered using a simple synchronized map and ConcurrentHashMap, but settled 
on using guava's LoadingCache. The issue with the localization pattern is that 
it is bursty. Most of the downloads happen in parallel, and thus most of these 
getFileStatus calls also go out in a burst. With a synchronized map, the 
problem is that these calls would be unnecessarily serialized (as it needs to 
acquire a global lock for this map). With a ConcurrentHashMap, calls can be 
concurrent, but with a simple ConcurrentMap usage it becomes harder to avoid 
extra getFileStatus calls.

The LoadingCache maintains concurrency *and* limits the getFileStatus calls to 
strictly one call per path (I added the unit test to verify that).

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341.patch

Patch to enable the recovery of NMTokens.  Like YARN-1338 it uses leveldb as a 
state store or a null state store if recovery is not enabled.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
 Attachments: YARN-1341.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923278#comment-13923278
 ] 

Mayank Bansal commented on YARN-1389:
-

Thanks [~zjshen] for review
bq. 1. In javadoc of ApplicationClientProtocol, we shouldn't mention 
ApplicationHistoryServer, because the applications obtained from this protocol 
are all from RM cache instead of history store
Done
bq. 2. attempts won't be null, and getApplications doesn't throw 
ApplicationNotFoundException when getting an empty list of applications. Let's 
keep the behavior consistent. Same for getContainers. And in YarnClientImpl, 
don't process ApplicationAttemptNotFoundException and 
ContainerNotFoundException in the corresponding places.
In some cases in can be null so keeping it that way. We wanted to avoid get 
status for application.
bq. 3. TestClientRMService needs more test cases as well, like what you did in 
TestYarnClient
Done
bq. 4. Please test the new APIs in a presudo cluster to verify whether it works 
or not. Thanks!
Done , it works :)

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923276#comment-13923276
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633251/YARN-1341.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3283//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3283//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923280#comment-13923280
 ] 

Mayank Bansal commented on YARN-1389:
-

bq .I realize there will be an issue that may not have immediate solution. 
Currently, if an application is finished, we can get all the finished 
containers of it from the history store. However,if an application is still 
running, YarnScheduler is going to remove the container out of its cache once 
the container is done. Therefore, we're unable to get the finished containers 
of a running application.
It seems that we need to cache RMContainer until the application is finished. 
Thoughts?

yes we should have this as right now there is inconsistency in finished 
containers for running apps, I will create another JIRA to track that.

Thanks,
Mayank

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1389:


Attachment: YARN-1389-5.patch

Updating latest patch.

Thanks,
Mayank

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923294#comment-13923294
 ] 

Hadoop QA commented on YARN-1771:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633247/yarn-1771.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/3282//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3282//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3282//console

This message is automatically generated.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1341:
-

Attachment: YARN-1341v2.patch

Revised patch without the addition of the state store to the NM context since 
it's not necessary for this change.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1794) Yarn CLI only shows running containers for Running Applications

2014-03-06 Thread Mayank Bansal (JIRA)
Mayank Bansal created YARN-1794:
---

 Summary: Yarn CLI only shows running containers for Running 
Applications
 Key: YARN-1794
 URL: https://issues.apache.org/jira/browse/YARN-1794
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


As we plan to have the APIs in ApplicationHistoryProtocol to expose the reports 
of *finished* application attempts and containers, we should do the same for 
ApplicationClientProtocol, which will return the reports of *running* attempts 
and containers.

Later on, we can improve YarnClient to direct the query of running instance to 
ApplicationClientProtocol, while that of finished instance to 
ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1794) Yarn CLI only shows running containers for Running Applications

2014-03-06 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1794:


Description: (was: As we plan to have the APIs in 
ApplicationHistoryProtocol to expose the reports of *finished* application 
attempts and containers, we should do the same for ApplicationClientProtocol, 
which will return the reports of *running* attempts and containers.

Later on, we can improve YarnClient to direct the query of running instance to 
ApplicationClientProtocol, while that of finished instance to 
ApplicationHistoryProtocol, making it transparent to the users.)

 Yarn CLI only shows running containers for Running Applications
 ---

 Key: YARN-1794
 URL: https://issues.apache.org/jira/browse/YARN-1794
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1794) Yarn CLI only shows running containers for Running Applications

2014-03-06 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923297#comment-13923297
 ] 

Mayank Bansal commented on YARN-1794:
-

After YARN-1389 we have capability to show Attemps and Containers for running 
application however we can not show finished containers for a running 
application until and unless app is finished.

Thanks,
Mayank

 Yarn CLI only shows running containers for Running Applications
 ---

 Key: YARN-1794
 URL: https://issues.apache.org/jira/browse/YARN-1794
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923301#comment-13923301
 ] 

Mayank Bansal commented on YARN-1389:
-

Opened the JIRA for this issue
YARN-1794

Thanks,
Mayank

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts

2014-03-06 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923302#comment-13923302
 ] 

Mayank Bansal commented on YARN-1621:
-

It should be covered by YARN-1389

Thanks,
Mayank

 Add CLI to list states of yarn container-IDs/hosts
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
 Fix For: 2.4.0


 As more applications are moved to YARN, we need generic CLI to list states of 
 yarn containers and their hosts. Today if YARN application running in a 
 container does hang, there is no way other than to manually kill its process.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers. 
 {code:title=proposed yarn cli}
 $ yarn application -list-containers appId status
 where status is one of running/succeeded/killed/failed/all
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1790) FairSchedule UI not showing apps table

2014-03-06 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923306#comment-13923306
 ] 

Sandy Ryza commented on YARN-1790:
--

+1

 FairSchedule UI not showing apps table
 --

 Key: YARN-1790
 URL: https://issues.apache.org/jira/browse/YARN-1790
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: bc Wong
Assignee: bc Wong
 Attachments: 
 0001-YARN-1790.-FairScheduler-UI-not-showing-apps-table.patch, fs_ui.png, 
 fs_ui_fixed.png


 There is a running job, which shows up in the summary table in the 
 FairScheduler UI, the queue display, etc. Just not in the apps table at the 
 bottom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Cindy Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1525:
---

Attachment: YARN1525.secure.v10.patch

Made changes according to Karthik's comments.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.v7.patch, YARN1525.v7.patch, 
 YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-06 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-1771:
--

Attachment: yarn-1771.patch

Fixed javadoc.

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1795) Oozie tests are flakey after YARN-713

2014-03-06 Thread Robert Kanter (JIRA)
Robert Kanter created YARN-1795:
---

 Summary: Oozie tests are flakey after YARN-713
 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter


Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
of the tests to be flakey.  Doing some digging, I found that they were failing 
because some of the MR jobs were failing; I found this in the syslog of the 
failed jobs:
{noformat}
2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1394064846476_0013_m_00_0: Container launch failed for 
container_1394064846476_0013_01_03 : 
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for 192.168.1.77:50759
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
{noformat}

I did some debugging and found that the NMTokenCache has a different port 
number than what's being looked up.  For example, the NMTokenCache had one 
token with address 192.168.1.77:58217 but 
ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
Container is being launched it somehow has a different port than when the token 
was created.

Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923372#comment-13923372
 ] 

Hadoop QA commented on YARN-1341:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633265/YARN-1341v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3285//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3285//console

This message is automatically generated.

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1795) Oozie tests are flakey after YARN-713

2014-03-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1795:
---

Priority: Critical  (was: Major)

 Oozie tests are flakey after YARN-713
 -

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Critical

 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1795) Oozie tests are flakey after YARN-713

2014-03-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1795:
---

Target Version/s: 2.4.0

 Oozie tests are flakey after YARN-713
 -

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Critical

 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1796) container-executor shouldn't require o-r permissions

2014-03-06 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated YARN-1796:
-

Attachment: YARN-1796.patch

Simple patch attached to relax the mode check in the container-executor. This 
patch also takes the liberty of fixing an inaccurate code comment that was 
nearby.

 container-executor shouldn't require o-r permissions
 

 Key: YARN-1796
 URL: https://issues.apache.org/jira/browse/YARN-1796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Attachments: YARN-1796.patch


 The container-executor currently checks that other users don't have read 
 permissions. This is unnecessary and runs contrary to the debian packaging 
 policy manual.
 This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1796) container-executor shouldn't require o-r permissions

2014-03-06 Thread Aaron T. Myers (JIRA)
Aaron T. Myers created YARN-1796:


 Summary: container-executor shouldn't require o-r permissions
 Key: YARN-1796
 URL: https://issues.apache.org/jira/browse/YARN-1796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor


The container-executor currently checks that other users don't have read 
permissions. This is unnecessary and runs contrary to the debian packaging 
policy manual.

This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923395#comment-13923395
 ] 

Hadoop QA commented on YARN-1525:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633269/YARN1525.secure.v10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3286//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3286//console

This message is automatically generated.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.v7.patch, YARN1525.v7.patch, 
 YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923398#comment-13923398
 ] 

Cindy Li commented on YARN-1525:


The test org.apache.hadoop.yarn.client.api.impl.TestNMClient is irrelevant.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.v7.patch, YARN1525.v7.patch, 
 YARN1525.v8.patch, YARN1525.v9.patch, Yarn1525.secure.patch, 
 Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1525:
---

Attachment: YARN1525.secure.v11.patch

Thanks Cindy. Posting a patch with cosmetic changes (formatting etc.); also, 
removed changes to ResourceTrackerPBClientImpl which seemed spurious. 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.secure.v11.patch, YARN1525.v7.patch, 
 YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Cindy Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923406#comment-13923406
 ] 

Cindy Li commented on YARN-1525:


That was for another patch... Ok. I should've removed that. Thanks for removing 
that.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.secure.v11.patch, YARN1525.v7.patch, 
 YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1796) container-executor shouldn't require o-r permissions

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923433#comment-13923433
 ] 

Vinod Kumar Vavilapalli commented on YARN-1796:
---

I think I originally did that code in 1.x and 2.x. I know we were being 
excessively paranoid, but I haven't seen a reason why it should be opened up 
either. Where is the problem as it exists today?

 container-executor shouldn't require o-r permissions
 

 Key: YARN-1796
 URL: https://issues.apache.org/jira/browse/YARN-1796
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers
Priority: Minor
 Attachments: YARN-1796.patch


 The container-executor currently checks that other users don't have read 
 permissions. This is unnecessary and runs contrary to the debian packaging 
 policy manual.
 This is the analogous fix for YARN that was done for MR1 in MAPREDUCE-2103.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1410) Handle RM fails over after getApplicationID() and before submitApplication().

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923434#comment-13923434
 ] 

Vinod Kumar Vavilapalli commented on YARN-1410:
---

Okay, I went back and reread the thing. It seems like we diverged off again. 
The approach in the latest patch seems like it isn't the same as what Bikas and 
you agreed upon. Is that true? [~bikassaha], can you confirm if it is fine? We 
now blindly accepts appIDs generated by previous RM. Clearly, there are 
possibilities of malicious users generating appIDs (which exists today) - but 
there are a couple of ways in which we can fix that.

Originally, it was also suggested that we add app-ID to the SubmitResponse - 
which we aren't doing anymore as we blindly accept IDs from previous RMs now in 
the latest patch. Is that fine?

 Handle RM fails over after getApplicationID() and before submitApplication().
 -

 Key: YARN-1410
 URL: https://issues.apache.org/jira/browse/YARN-1410
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-1410-outline.patch, YARN-1410.1.patch, 
 YARN-1410.10.patch, YARN-1410.10.patch, YARN-1410.2.patch, YARN-1410.2.patch, 
 YARN-1410.3.patch, YARN-1410.4.patch, YARN-1410.5.patch, YARN-1410.6.patch, 
 YARN-1410.7.patch, YARN-1410.8.patch, YARN-1410.9.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 App submission involves
 1) creating appId
 2) using that appId to submit an ApplicationSubmissionContext to the user.
 The client may have obtained an appId from an RM, the RM may have failed 
 over, and the client may submit the app to the new RM.
 Since the new RM has a different notion of cluster timestamp (used to create 
 app id) the new RM may reject the app submission resulting in unexpected 
 failure on the client side.
 The same may happen for other 2 step client API operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1795) Oozie tests are flakey after YARN-713

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923457#comment-13923457
 ] 

Vinod Kumar Vavilapalli commented on YARN-1795:
---

Per [~sseth], it is likely that you are confusing the ports because it is 
MiniYarnCluster setup where you are running multiple NMs on the same machine? 
The bug seems valid, but may be the analysis isn't. Not sure completely either 
ways. It'll be useful if you can capture RM logs specifically for this 
container.

 Oozie tests are flakey after YARN-713
 -

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Critical

 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1342) Recover container tokens upon nodemanager restart

2014-03-06 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1342:
-

Attachment: YARN-1342.patch

Patch to recover container tokens after a restart.  This is very similar to the 
patch for YARN-1341 but for container tokens instead of NM tokens.

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
 Attachments: YARN-1342.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923492#comment-13923492
 ] 

Hadoop QA commented on YARN-1525:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633287/YARN1525.secure.v11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3288//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3288//console

This message is automatically generated.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.secure.v11.patch, YARN1525.v7.patch, 
 YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923491#comment-13923491
 ] 

Jian He commented on YARN-1793:
---

Took a quick look at the patch, if I remember correctly, the special check in 
isAppSafeToTerminate for unmanaged AM is for this reason
https://issues.apache.org/jira/browse/YARN-540?focusedCommentId=13762533page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13762533

 yarn application -kill doesn't kill UnmanagedAMs
 

 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1793-0.patch


 Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs 
 a success, but doesn't actually kill the AM or reclaim the containers 
 allocated to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1780) Improve logging in timeline service

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923507#comment-13923507
 ] 

Vinod Kumar Vavilapalli commented on YARN-1780:
---

This looks good, +1. Checking this in.

 Improve logging in timeline service
 ---

 Key: YARN-1780
 URL: https://issues.apache.org/jira/browse/YARN-1780
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1780.1.patch, YARN-1780.1.patch


 It's difficult to trace whether the client has successfully posted the entity 
 to the timeline service or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1787) yarn applicationattempt/container print wrong usage information

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923521#comment-13923521
 ] 

Vinod Kumar Vavilapalli commented on YARN-1787:
---

I am fairly sure you broke bin/yarn appliciation etc after the patch, can you 
please verify?

The patch looks fine overall other than bin/yarn changes.

Ideally, we should split the CLI into separate classes for app, appattempts 
etc. Will file a ticket.

The other thing is that -queue Queue Name shouldn't be an option, it should 
just be an argument to -movetoqueue. Will file a ticket for that also.



 yarn applicationattempt/container print wrong usage information
 ---

 Key: YARN-1787
 URL: https://issues.apache.org/jira/browse/YARN-1787
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: ApplicationCLI.java.rej, YARN-1787.1.patch, 
 YARN-1787.2.patch


 yarn applicationattempt prints:
 {code}
 Invalid Command Usage : 
 usage: application
  -appStates States Works with -list to filter applications
  based on input comma-separated list of
  application states. The valid application
  state can be one of the following:
  ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
  NING,FINISHED,FAILED,KILLED
  -appTypes Types   Works with -list to filter applications
  based on input comma-separated list of
  application types.
  -help   Displays help for all commands.
  -kill Application ID  Kills the application.
  -list arg List application attempts for aplication
  from AHS.
  -movetoqueue Application ID   Moves the application to a different
  queue.
  -queue Queue Name Works with the movetoqueue command to
  specify which queue to move an
  application to.
  -status Application IDPrints the status of the application.
 {code}
 yarn container prints:
 {code}
 Invalid Command Usage : 
 usage: application
  -appStates States Works with -list to filter applications
  based on input comma-separated list of
  application states. The valid application
  state can be one of the following:
  ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUN
  NING,FINISHED,FAILED,KILLED
  -appTypes Types   Works with -list to filter applications
  based on input comma-separated list of
  application types.
  -help   Displays help for all commands.
  -kill Application ID  Kills the application.
  -list arg List application attempts for aplication
  from AHS.
  -movetoqueue Application ID   Moves the application to a different
  queue.
  -queue Queue Name Works with the movetoqueue command to
  specify which queue to move an
  application to.
  -status Application IDPrints the status of the application.
 {code}
 Both commands print irrelevant yarn application usage information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1795) Oozie tests are flakey after YARN-713

2014-03-06 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-1795:


Attachment: syslog

org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt

I've attached the output from one of the tests; the RM logs are intermixed in 
it; but its easy to just grep from the container in question. 
 
I've also attached the syslog from one of the containers 
({{container_1394161202967_0004_01_04}}) that had the problem.  I modified 
the NMTokenCache to print out the tokens whenever getToken is called, so that's 
in there too.  

 Oozie tests are flakey after YARN-713
 -

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Critical
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923531#comment-13923531
 ] 

Karthik Kambatla commented on YARN-1525:


Thanks Cindy. +1 on the latest patch. Committing this shortly. We can address 
any improvements in follow-up JIRAs.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.secure.v11.patch, YARN1525.v7.patch, 
 YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1389:
--

Target Version/s: 2.4.0

This is important part of the generic-history feature for 2.4.

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1621) Add CLI to list states of yarn container-IDs/hosts

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923540#comment-13923540
 ] 

Vinod Kumar Vavilapalli commented on YARN-1621:
---

It doesn't look like YARN-1389 is tracking filters for containers, so we need 
to track this separately.

 Add CLI to list states of yarn container-IDs/hosts
 --

 Key: YARN-1621
 URL: https://issues.apache.org/jira/browse/YARN-1621
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Tassapol Athiapinya
 Fix For: 2.4.0


 As more applications are moved to YARN, we need generic CLI to list states of 
 yarn containers and their hosts. Today if YARN application running in a 
 container does hang, there is no way other than to manually kill its process.
 For each running application, it is useful to differentiate between 
 running/succeeded/failed/killed containers. 
 {code:title=proposed yarn cli}
 $ yarn application -list-containers appId status
 where status is one of running/succeeded/killed/failed/all
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1797) TestNodeManagerResync,jjjjjjjjjjjjj

2014-03-06 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-1797:


 Summary: TestNodeManagerResync,j
 Key: YARN-1797
 URL: https://issues.apache.org/jira/browse/YARN-1797
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux

2014-03-06 Thread Tsuyoshi OZAWA (JIRA)
Tsuyoshi OZAWA created YARN-1798:


 Summary: TestContainerLaunch, TestContainersMonitor, 
TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux

2014-03-06 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923545#comment-13923545
 ] 

Tsuyoshi OZAWA commented on YARN-1798:
--

Here is 


{code} 
Failed tests:   


  TestContainerLaunch.testDelayedKill:723
 -internalKillTest:679
 -BaseContainerManagerTest.waitForContainerState:254
 -BaseContainerManagerTest.waitForContainerState:276 ContainerState is not 
correct (timedout) expected:COMPLETE but was:RUNNING
  
TestContainerLaunch.testImmediateKill:728-internalKillTest:679-BaseContainerManagerTest.waitForContainerState:254
 -BaseContainerManagerTest.waitForContainerState:276 ContainerState is not 
correct (timedout) expected:COMPLETE but was:RUNNING
  TestContainerLaunch.testContainerEnvVariables:557 Process is not alive!
  TestContainerManager.testContainerLaunchAndStop:333 Process is not alive!
  TestContainersMonitor.testContainerKillOnMemoryOverflow:273 expected:143 
but was:0 
  TestNodeManagerShutdown.testKillContainersOnShutdown:153 Did not find sigterm 
message
  TestNodeStatusUpdater.testNodeStatusUpdaterRetryAndNMShutdown:1186 Containers 
not cleaned up when NM stopped

Tests in error: 


  TestNodeManagerResync.testKillContainersOnResync:91 ? Metrics Metrics source 
J...
 



Tests run: 203, Failures: 7, Errors: 1, Skipped: 1  
{code}



 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux

2014-03-06 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923546#comment-13923546
 ] 

Tsuyoshi OZAWA commented on YARN-1798:
--

Here is a result of test execution locally. 

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Tsuyoshi OZAWA





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1795) Oozie tests are flakey after YARN-713

2014-03-06 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923552#comment-13923552
 ] 

Robert Kanter commented on YARN-1795:
-

Looking at the printouts I added to the NMTokenCache, I think I figured out 
some more:
During the tests, we run 2 NodeManagers.  With YARN-713, the NMTokenCache only 
ever has 1 token in it; when in the cases where a container is trying to use 
one NM and the token is for the other, we get the InvalidToken error.  I tried 
running without YARN-713, and the NMTokenCache usually has 2 tokens in it, so 
the containers are able to find the token in the NMTokenCache.  

I haven't had a chance to look into it more yet, but I did notice that YARN-713 
changes NMTokenSecretManagerInRM's createAndGetNMTokens method, which returns a 
list of tokens, to be createAndGetNMToken, which returns a single token.  
Perhaps that has something to do with this?  

 Oozie tests are flakey after YARN-713
 -

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Critical
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.

2014-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923551#comment-13923551
 ] 

Hadoop QA commented on YARN-1525:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633287/YARN1525.secure.v11.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3291//console

This message is automatically generated.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li
 Attachments: YARN1525.patch, YARN1525.patch, YARN1525.patch, 
 YARN1525.patch, YARN1525.patch.v1, YARN1525.patch.v2, YARN1525.patch.v3, 
 YARN1525.secure.v10.patch, YARN1525.secure.v11.patch, YARN1525.v7.patch, 
 YARN1525.v7.patch, YARN1525.v8.patch, YARN1525.v9.patch, 
 Yarn1525.secure.patch, Yarn1525.secure.patch


 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1799) Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff

2014-03-06 Thread Sunil G (JIRA)
Sunil G created YARN-1799:
-

 Summary: Enhance LocalDirAllocator in NM to consider 
DiskMaxUtilization cutoff
 Key: YARN-1799
 URL: https://issues.apache.org/jira/browse/YARN-1799
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sunil G


LocalDirAllocator provides paths for all tasks for its local write.
This considers the good list of directories which are selected by the 
HealthCheck mechamnism in LocalDirsHandlerService

getLocalPathForWrite() considers whether input demand size can meet the 
capacity in lastAccessed directory.
If more tasks asks for path from LocalDirAllocator, then it is possible that 
the allocation is done based on the current disk availability at that given 
time.
But this path would have earlier given to some other tasks to write and they 
may be sequentially doing writing.

It is better to check for an upper cutoff for disk availability



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1781) NM should allow users to specify max disk utilization for local disks

2014-03-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923563#comment-13923563
 ] 

Sunil G commented on YARN-1781:
---

I have created a separate JIRA YARN-1799 as per the comment from Vinod.

 NM should allow users to specify max disk utilization for local disks
 -

 Key: YARN-1781
 URL: https://issues.apache.org/jira/browse/YARN-1781
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1781.0.patch, apache-yarn-1781.1.patch, 
 apache-yarn-1781.2.patch


 This is related to YARN-257(it's probably a sub task?). Currently, the NM 
 does not detect full disks and allows full disks to be used by containers 
 leading to repeated failures. YARN-257 deals with graceful handling of full 
 disks. This ticket is only about detection of full disks by the disk health 
 checkers.
 The NM should allow users to set a maximum disk utilization for local disks 
 and mark disks as bad once they exceed that utilization. At the very least, 
 the NM should at least detect full disks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1780) Improve logging in timeline service

2014-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923564#comment-13923564
 ] 

Hudson commented on YARN-1780:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5280 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5280/])
YARN-1780. Improved logging in the Timeline client and server. Contributed by 
Zhijie Shen. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1575141)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestTimelineClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TimelineWebServices.java


 Improve logging in timeline service
 ---

 Key: YARN-1780
 URL: https://issues.apache.org/jira/browse/YARN-1780
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.4.0

 Attachments: YARN-1780.1.patch, YARN-1780.1.patch


 It's difficult to trace whether the client has successfully posted the entity 
 to the timeline service or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1793:
---

Attachment: yarn-1793-1.patch

 yarn application -kill doesn't kill UnmanagedAMs
 

 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1793-0.patch, yarn-1793-1.patch


 Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs 
 a success, but doesn't actually kill the AM or reclaim the containers 
 allocated to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1764) Handle RM fail overs after the submitApplication call.

2014-03-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923566#comment-13923566
 ] 

Vinod Kumar Vavilapalli commented on YARN-1764:
---

I think we should also mark getApplicationReport() to be idempotent in this 
patch itself as RM can fail-over after submitApplication() returned but 
*during* a getApplicationReport(). We will need to add some tests for this too.

 Handle RM fail overs after the submitApplication call.
 --

 Key: YARN-1764
 URL: https://issues.apache.org/jira/browse/YARN-1764
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-1764.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923568#comment-13923568
 ] 

Karthik Kambatla commented on YARN-1793:


Thanks for digging up the reason, [~jianhe]. Can you take a look at the updated 
patch. 

Looked into this more carefully. Updated patch changes both ClientRMService and 
ApplicationMasterService. I believe we have three cases based on the state and 
kind of application.
* Applications that have already reached a final state - do nothing, trivially 
log success.
* Applications that aren't in a final state yet - kill / unregister the 
application
** UnmanagedAM - falsely acknowledge kill/ unregister so they don't retry
** ManagedAM - return false, so they keep retrying

Submitted the patch to see what Jenkins has to say. Still need to add unit 
tests.



 yarn application -kill doesn't kill UnmanagedAMs
 

 Key: YARN-1793
 URL: https://issues.apache.org/jira/browse/YARN-1793
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: yarn-1793-0.patch, yarn-1793-1.patch


 Trying to kill an Unmanaged AM though CLI (yarn application -kill id) logs 
 a success, but doesn't actually kill the AM or reclaim the containers 
 allocated to it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923569#comment-13923569
 ] 

Zhijie Shen commented on YARN-1389:
---

Thanks for the update, Mayank! The patch is general fine. Here're some 
additional comments

1. is it simpler to use e instanceof NotFoundException?
{code}
+  // Even if history-service is enabled, treat all exceptions still the 
same
+  // except the following
+  if (e.getClass() != ApplicationNotFoundException.class
+   e.getClass() != ApplicationAttemptNotFoundException.class) {
+throw e;
+  }
{code}

2. getFinishedStatus() is not necessary. You can directly do when() on 
getDiagnostics/getExitStatus/getState.
{code}
+ContainerStatus cs = mock(ContainerStatus.class);
+when(containerimpl.getFinishedStatus()).thenReturn(cs);
+when(containerimpl.getFinishedStatus().getDiagnostics()).thenReturn(N/A);
+when(containerimpl.getFinishedStatus().getExitStatus()).thenReturn(0);
+when(containerimpl.getFinishedStatus().getState()).thenReturn(
+ContainerState.COMPLETE);
{code}

3. There're a lot of code duplication in TestClientRMService. You can move the 
common code into a private createClientRMService method, which is called by 
your test methods.

4. Shouldn't we remove throw new YarnException(History service is not 
enabled.); in YarnClientImpl?

5. Shouldn't we assert fail here, because the exception is not excepted? 
Similar in other test cases.
{code}
+} catch (ApplicationNotFoundException ex) {
+  Assert.assertEquals(ex.getMessage(),
+  Application with id ' + request.getApplicationAttemptId()
+  + ' doesn't exist in RM.);
+}
{code}

In addition to that, personally, I'm still object to throwing 
AppAttempt/Container not found exception when getting empty appattempt and 
container list. Let's assume history service is disabled. Then, getting empty 
applications is allowed while getting empty appattempt/container list will 
result in exception. Th inconsistent behavior is going to confuse users. In 
particular, It is likely that a running application doesn't have any appattempt 
(e.g. the app is before ACCEPTED and is the first attempt).

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1799) Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff

2014-03-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923570#comment-13923570
 ] 

Sunil G commented on YARN-1799:
---

I would like to take up this JIRA.

 Enhance LocalDirAllocator in NM to consider DiskMaxUtilization cutoff
 -

 Key: YARN-1799
 URL: https://issues.apache.org/jira/browse/YARN-1799
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sunil G

 LocalDirAllocator provides paths for all tasks for its local write.
 This considers the good list of directories which are selected by the 
 HealthCheck mechamnism in LocalDirsHandlerService
 getLocalPathForWrite() considers whether input demand size can meet the 
 capacity in lastAccessed directory.
 If more tasks asks for path from LocalDirAllocator, then it is possible that 
 the allocation is done based on the current disk availability at that given 
 time.
 But this path would have earlier given to some other tasks to write and they 
 may be sequentially doing writing.
 It is better to check for an upper cutoff for disk availability



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-06 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923574#comment-13923574
 ] 

Zhijie Shen commented on YARN-1389:
---

I've tested the patch locally. yarn applicationattempt seems to be able to 
get and list attempts of RM cached application. yarn container is going to 
result in the following crash:

{code}
zjshen-mac-pc:Deployment zshen$ yarn container -status 
container_1394168341541_0003_01_01
14/03/06 21:06:51 INFO client.RMProxy: Connecting to ResourceManager at 
localhost/127.0.0.1:9104
14/03/06 21:06:51 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
14/03/06 21:06:51 INFO client.AHSProxy: Connecting to Application History 
server at /0.0.0.0:10200
Exception in thread main java.lang.NullPointerException: 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.getDiagnosticsInfo(RMContainerImpl.java:253)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.createContainerReport(RMContainerImpl.java:439)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:413)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:364)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:349)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2071)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2067)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:394)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1597)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2065)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at 
org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:107)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainerReport(ApplicationClientProtocolPBClientImpl.java:375)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:189)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getContainerReport(Unknown Source)
at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainerReport(YarnClientImpl.java:519)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.printContainerReport(ApplicationCLI.java:292)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:150)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:76)
Caused by: 
org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.getDiagnosticsInfo(RMContainerImpl.java:253)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.createContainerReport(RMContainerImpl.java:439)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:413)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:364)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:349)
at 

  1   2   >