date:20140811


[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092538#comment-14092538
 ] 

Hadoop QA commented on YARN-2138:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660936/YARN-2138.004.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4584//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4584//console

This message is automatically generated.

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, 
 YARN-2138.004.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2361) RMAppAttempt state machine entries for KILLED state has duplicate event entries


[ 
https://issues.apache.org/jira/browse/YARN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092602#comment-14092602
 ] 

Hudson commented on YARN-2361:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #641 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/641/])
YARN-2361. RMAppAttempt state machine entries for KILLED state has duplicate 
event entries. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617190)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 RMAppAttempt state machine entries for KILLED state has duplicate event 
 entries
 ---

 Key: YARN-2361
 URL: https://issues.apache.org/jira/browse/YARN-2361
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Fix For: 2.6.0

 Attachments: YARN-2361.000.patch


 remove duplicate entries in the EnumSet of event type in RMAppAttempt state 
 machine. The  event RMAppAttemptEventType.EXPIRE is duplicated in the 
 following code.
 {code}
   EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED,
   RMAppAttemptEventType.EXPIRE,
   RMAppAttemptEventType.LAUNCHED,
   RMAppAttemptEventType.LAUNCH_FAILED,
   RMAppAttemptEventType.EXPIRE,
   RMAppAttemptEventType.REGISTERED,
   RMAppAttemptEventType.CONTAINER_ALLOCATED,
   RMAppAttemptEventType.UNREGISTERED,
   RMAppAttemptEventType.KILL,
   RMAppAttemptEventType.STATUS_UPDATE))
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2337) ResourceManager sets ClientRMService in RMContext multiple times


[ 
https://issues.apache.org/jira/browse/YARN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092601#comment-14092601
 ] 

Hudson commented on YARN-2337:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #641 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/641/])
YARN-2337. ResourceManager sets ClientRMService in RMContext multiple times. 
(Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617183)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 ResourceManager sets ClientRMService in RMContext multiple times
 

 Key: YARN-2337
 URL: https://issues.apache.org/jira/browse/YARN-2337
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2337.000.patch


 remove duplication function call (setClientRMService) in resource manage 
 class.
 rmContext.setClientRMService(clientRM); is duplicate in serviceInit of 
 ResourceManager. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2403) TestNodeManagerResync fails occasionally in trunk


 [ 
https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2403:
-

Attachment: YARN-2403.patch

 TestNodeManagerResync fails occasionally in trunk
 -

 Key: YARN-2403
 URL: https://issues.apache.org/jira/browse/YARN-2403
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: YARN-2403.patch


 From  https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ :
 {code}
   
 TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146
  expected:2 but was:1
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2403) TestNodeManagerResync fails occasionally in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092615#comment-14092615
 ] 

Junping Du commented on YARN-2403:
--

The variable of registrationCount should be protected by volatile in concurrent 
environment. Will deliver a quick patch to fix it.

 TestNodeManagerResync fails occasionally in trunk
 -

 Key: YARN-2403
 URL: https://issues.apache.org/jira/browse/YARN-2403
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: YARN-2403.patch


 From  https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ :
 {code}
   
 TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146
  expected:2 but was:1
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2403) TestNodeManagerResync fails occasionally in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092639#comment-14092639
 ] 

Hadoop QA commented on YARN-2403:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660964/YARN-2403.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4585//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4585//console

This message is automatically generated.

 TestNodeManagerResync fails occasionally in trunk
 -

 Key: YARN-2403
 URL: https://issues.apache.org/jira/browse/YARN-2403
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: YARN-2403.patch


 From  https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ :
 {code}
   
 TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146
  expected:2 but was:1
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2337) ResourceManager sets ClientRMService in RMContext multiple times


[ 
https://issues.apache.org/jira/browse/YARN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092699#comment-14092699
 ] 

Hudson commented on YARN-2337:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1834/])
YARN-2337. ResourceManager sets ClientRMService in RMContext multiple times. 
(Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617183)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 ResourceManager sets ClientRMService in RMContext multiple times
 

 Key: YARN-2337
 URL: https://issues.apache.org/jira/browse/YARN-2337
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2337.000.patch


 remove duplication function call (setClientRMService) in resource manage 
 class.
 rmContext.setClientRMService(clientRM); is duplicate in serviceInit of 
 ResourceManager. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2361) RMAppAttempt state machine entries for KILLED state has duplicate event entries


[ 
https://issues.apache.org/jira/browse/YARN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092700#comment-14092700
 ] 

Hudson commented on YARN-2361:
--

ABORTED: Integrated in Hadoop-Hdfs-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1834/])
YARN-2361. RMAppAttempt state machine entries for KILLED state has duplicate 
event entries. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617190)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 RMAppAttempt state machine entries for KILLED state has duplicate event 
 entries
 ---

 Key: YARN-2361
 URL: https://issues.apache.org/jira/browse/YARN-2361
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Fix For: 2.6.0

 Attachments: YARN-2361.000.patch


 remove duplicate entries in the EnumSet of event type in RMAppAttempt state 
 machine. The  event RMAppAttemptEventType.EXPIRE is duplicated in the 
 following code.
 {code}
   EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED,
   RMAppAttemptEventType.EXPIRE,
   RMAppAttemptEventType.LAUNCHED,
   RMAppAttemptEventType.LAUNCH_FAILED,
   RMAppAttemptEventType.EXPIRE,
   RMAppAttemptEventType.REGISTERED,
   RMAppAttemptEventType.CONTAINER_ALLOCATED,
   RMAppAttemptEventType.UNREGISTERED,
   RMAppAttemptEventType.KILL,
   RMAppAttemptEventType.STATUS_UPDATE))
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords

2014-08-11 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092714#comment-14092714
 ] 

Varun Vasudev commented on YARN-2373:
-

[~lmccay] none of those should be a blocker. With regards to documentation, I 
was referring to pages like 
[this|http://hadoop.apache.org/docs/r2.4.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html].

[~jianhe] can you please commit the patch?

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2361) RMAppAttempt state machine entries for KILLED state has duplicate event entries


[ 
https://issues.apache.org/jira/browse/YARN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092843#comment-14092843
 ] 

Hudson commented on YARN-2361:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1860 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1860/])
YARN-2361. RMAppAttempt state machine entries for KILLED state has duplicate 
event entries. (Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617190)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


 RMAppAttempt state machine entries for KILLED state has duplicate event 
 entries
 ---

 Key: YARN-2361
 URL: https://issues.apache.org/jira/browse/YARN-2361
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
 Fix For: 2.6.0

 Attachments: YARN-2361.000.patch


 remove duplicate entries in the EnumSet of event type in RMAppAttempt state 
 machine. The  event RMAppAttemptEventType.EXPIRE is duplicated in the 
 following code.
 {code}
   EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED,
   RMAppAttemptEventType.EXPIRE,
   RMAppAttemptEventType.LAUNCHED,
   RMAppAttemptEventType.LAUNCH_FAILED,
   RMAppAttemptEventType.EXPIRE,
   RMAppAttemptEventType.REGISTERED,
   RMAppAttemptEventType.CONTAINER_ALLOCATED,
   RMAppAttemptEventType.UNREGISTERED,
   RMAppAttemptEventType.KILL,
   RMAppAttemptEventType.STATUS_UPDATE))
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2337) ResourceManager sets ClientRMService in RMContext multiple times


[ 
https://issues.apache.org/jira/browse/YARN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092842#comment-14092842
 ] 

Hudson commented on YARN-2337:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1860 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1860/])
YARN-2337. ResourceManager sets ClientRMService in RMContext multiple times. 
(Zhihai Xu via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617183)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java


 ResourceManager sets ClientRMService in RMContext multiple times
 

 Key: YARN-2337
 URL: https://issues.apache.org/jira/browse/YARN-2337
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Trivial
  Labels: newbie
 Fix For: 2.6.0

 Attachments: YARN-2337.000.patch


 remove duplication function call (setClientRMService) in resource manage 
 class.
 rmContext.setClientRMService(clientRM); is duplicate in serviceInit of 
 ResourceManager. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time


 [ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1915:
-

Attachment: YARN-1915v2.patch

Fixed findbug warning.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-11 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277-v6.patch

Addressed test failure with v6 of the patch.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092916#comment-14092916
 ] 

Junping Du commented on YARN-1337:
--

Thanks [~jlowe] for updating the patch. Some trivial issues to fix below, other 
looks good to me:

{code}
+  public void addCompletedContainer(ContainerId containerId);
+
{code}
Better to add javadoc for new added (or move from private) public method.

{code}
-  private volatile AtomicBoolean shouldLaunchContainer = new 
AtomicBoolean(false);
-  private volatile AtomicBoolean completed = new AtomicBoolean(false);
+  protected volatile AtomicBoolean shouldLaunchContainer =
+  new AtomicBoolean(false);
+  protected volatile AtomicBoolean completed = new AtomicBoolean(false);
{code}
volatile is unncessary as it was using AtomicBoolean already.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied

2014-08-11 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1813:
-

Attachment: YARN-1813.4.patch

Refreshed the latest patch.

 Better error message for yarn logs when permission denied
 ---

 Key: YARN-1813
 URL: https://issues.apache.org/jira/browse/YARN-1813
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, 
 YARN-1813.3.patch, YARN-1813.4.patch


 I ran some MR jobs as the hdfs user, and then forgot to sudo -u when 
 grabbing the logs. yarn logs prints an error message like the following:
 {noformat}
 [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010
 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at 
 a2402.halxg.cloudera.com/10.20.212.10:8032
 Logs not available at 
 /tmp/logs/andrew.wang/logs/application_1394482121761_0010
 Log aggregation has not completed or is not enabled.
 {noformat}
 It'd be nicer if it said Permission denied or AccessControlException or 
 something like that instead, since that's the real issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time


[ 
https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092949#comment-14092949
 ] 

Hadoop QA commented on YARN-1915:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660996/YARN-1915v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4586//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4586//console

This message is automatically generated.

 ClientToAMTokenMasterKey should be provided to AM at launch time
 

 Key: YARN-1915
 URL: https://issues.apache.org/jira/browse/YARN-1915
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.2.0
Reporter: Hitesh Shah
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1915.patch, YARN-1915v2.patch


 Currently, the AM receives the key as part of registration. This introduces a 
 race where a client can connect to the AM when the AM has not received the 
 key. 
 Current Flow:
 1) AM needs to start the client listening service in order to get host:port 
 and send it to the RM as part of registration
 2) RM gets the port info in register() and transitions the app to RUNNING. 
 Responds back with client secret to AM.
 3) User asks RM for client token. Gets it and pings the AM. AM hasn't 
 received client secret from RM and so RPC itself rejects the request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092959#comment-14092959
 ] 

Hadoop QA commented on YARN-2277:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661000/YARN-2277-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common:

  org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl
  org.apache.hadoop.ha.TestZKFailoverControllerStress

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4587//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4587//console

This message is automatically generated.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied


[ 
https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092989#comment-14092989
 ] 

Hadoop QA commented on YARN-1813:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661008/YARN-1813.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4588//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4588//console

This message is automatically generated.

 Better error message for yarn logs when permission denied
 ---

 Key: YARN-1813
 URL: https://issues.apache.org/jira/browse/YARN-1813
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, 
 YARN-1813.3.patch, YARN-1813.4.patch


 I ran some MR jobs as the hdfs user, and then forgot to sudo -u when 
 grabbing the logs. yarn logs prints an error message like the following:
 {noformat}
 [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010
 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at 
 a2402.halxg.cloudera.com/10.20.212.10:8032
 Logs not available at 
 /tmp/logs/andrew.wang/logs/application_1394482121761_0010
 Log aggregation has not completed or is not enabled.
 {noformat}
 It'd be nicer if it said Permission denied or AccessControlException or 
 something like that instead, since that's the real issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093001#comment-14093001
 ] 

chang li commented on YARN-2308:


[~wangda]
I have updated my patch according to your suggestion. The patch is uploaded.

Thanks

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


 [ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2308:
---

Attachment: jira2308.patch

updated patch according to Wangda's advice

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093009#comment-14093009
 ] 

Hadoop QA commented on YARN-2308:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661016/jira2308.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4590//console

This message is automatically generated.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093027#comment-14093027
 ] 

Jian He commented on YARN-2373:
---

First look, this seems to be a bug ? earlier the first parameter is 
sslConf.get(ssl.server.truststore.location), but now it's changed to 
passwd, 
{code}
.trustStore(getPassword(sslConf, WEB_APP_TRUSTSTORE_PASSWORD_KEY),
   sslConf.get(WEB_APP_TRUSTSTORE_PASSWORD_KEY),
{code}

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093035#comment-14093035
 ] 

Hadoop QA commented on YARN-2277:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661000/YARN-2277-v6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4589//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4589//console

This message is automatically generated.

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently

2014-08-11 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093040#comment-14093040
 ] 

Xuan Gong commented on YARN-2400:
-

Committed this addendum patch to trunk and branch-2. Thanks, Jian.

 TestAMRestart fails intermittently
 --

 Key: YARN-2400
 URL: https://issues.apache.org/jira/browse/YARN-2400
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2240.2.patch, YARN-2400.1.patch


 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093094#comment-14093094
 ] 

Larry McCay commented on YARN-2373:
---

Thanks for the review, Jian. I will take a look today!



 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093099#comment-14093099
 ] 

Jian He commented on YARN-2138:
---

[~varun_saxena], thanks for the input. committing this..

 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, 
 YARN-2138.004.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)

2014-08-11 Thread Maysam Yabandeh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093111#comment-14093111
 ] 

Maysam Yabandeh commented on YARN-2405:
---

The problem seems to be that the two separate lists that maintain the list of 
apps are not in sync. The list of apps is taken from 
{code}
MapApplicationId, RMApp rmContext.getRMApps() 
{code}
and then looked up in the second list in AbstractYarnScheduler
{code}
MapApplicationId, SchedulerApplication applications
{code}
via the following code:
{code}
  public FSSchedulerApp getSchedulerApp(ApplicationAttemptId appAttemptId) {
return (FSSchedulerApp) super.getApplicationAttempt(appAttemptId);
  }

  public T getApplicationAttempt(ApplicationAttemptId applicationAttemptId) {
SchedulerApplicationT app =
applications.get(applicationAttemptId.getApplicationId());
return app == null ? null : app.getCurrentAppAttempt();
  }
{code}
which returns null if it does not find the app attempt. The 
FairSchedulerAppsBlock does not check for the null returned value, thus NPE.

By code inspection we found one of such cases that it could happen. Not sure if 
it is the same case that we had though. Anyhow, checking for null return values 
by getSchedulerApp seems to be a broader fix that covers that cases that we 
have not discovered yet by code inspection.

One scenario that could potentially result into return null value is the 
following: FairScheduler#addApplication
{code}
RMApp rmApp = rmContext.getRMApps().get(applicationId);
FSLeafQueue queue = assignToQueue(rmApp, queueName, user);
if (queue == null) {
  return;
}
// Enforce ACLs
UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user);
if (...) {
  return;
}
  
SchedulerApplication application =
new SchedulerApplication(queue, user);
applications.put(applicationId, application);
{code}

 NPE in FairSchedulerAppsBlock (scheduler page)
 --

 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh

 FairSchedulerAppsBlock#render throws NPE at this line
 {code}
   int fairShare = fsinfo.getAppFairShare(attemptId);
 {code}
 This causes the scheduler page now showing the app since it lack the 
 definition of appsTableData
 {code}
  Uncaught ReferenceError: appsTableData is not defined 
 {code}
 The problem is temporary meaning that it is usually resolved by itself either 
 after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart


[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093109#comment-14093109
 ] 

Jian He commented on YARN-2229:
---

patch looks good to me. will commit in a day or two if no further comments.

 ContainerId can overflow with RM restart
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229.1.patch, YARN-2229.10.patch, 
 YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
 YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, 
 YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)

2014-08-11 Thread Maysam Yabandeh (JIRA)

Maysam Yabandeh created YARN-2405:
-

 Summary: NPE in FairSchedulerAppsBlock (scheduler page)
 Key: YARN-2405
 URL: https://issues.apache.org/jira/browse/YARN-2405
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Maysam Yabandeh


FairSchedulerAppsBlock#render throws NPE at this line
{code}
  int fairShare = fsinfo.getAppFairShare(attemptId);
{code}
This causes the scheduler page now showing the app since it lack the definition 
of appsTableData
{code}
 Uncaught ReferenceError: appsTableData is not defined 
{code}
The problem is temporary meaning that it is usually resolved by itself either 
after a retry or after a few hours.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode

2014-08-11 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093120#comment-14093120
 ] 

Zhijie Shen commented on YARN-2397:
---

[~vvasudev], thanks for the new patch! The logic of loading the simple auth 
filter seems to be still problematic:
{code}
// if security is not enabled and the default filter initializer has not 
// been set, set the initializer to include the
// RMAuthenticationFilterInitializer which in turn will set up the simple
// auth filter.

String initializers = conf.get(filterInitializerConfKey);
if (!UserGroupInformation.isSecurityEnabled()) {
  if (initializersClasses == null || initializersClasses.length == 0) {
conf.set(filterInitializerConfKey,
  RMAuthenticationFilterInitializer.class.getName());
conf.set(authTypeKey, simple);
  } else if (initializers.equals(StaticUserWebFilter.class.getName())) {
conf.set(filterInitializerConfKey,
  RMAuthenticationFilterInitializer.class.getName() + ,
  + initializers);
conf.set(authTypeKey, simple);
  }
}
{code}

4 conditions need to be satisfied to load the kerberos+DT auth filter. Then, in 
the remaining cases, the simple auth filter should be loaded, right? Or there 
intentionally exist the cases neither Kerberos+DT nor simple auth filter is 
used? If it is the former scenario,
{code}
if (!UserGroupInformation.isSecurityEnabled()) {
{code}
The above code will causes that any break except that of condition 1 result in 
no auth filter at all.

And it still make the assumption that filter initializer can only be of auth 
and static user. However, initializersClasses can contain more than that (see 
YARN-2277).

For the simple auth filter case, it's good to always use 
RMAuthenticationFilterInitializer or the standard 
AuthenticationFilterInitializer. The current code will causes that 
AuthenticationFilterInitializer is used under some configuration setup while 
RMAuthenticationFilterInitializer is used under the others.

 RM web interface sometimes returns request is a replay error in secure mode
 ---

 Key: YARN-2397
 URL: https://issues.apache.org/jira/browse/YARN-2397
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Critical
 Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch


 The RM web interface sometimes returns a request is a replay error if the 
 default kerberos http filter is enabled. This is because it uses the new 
 RMAuthenticationFilter in addition to the AuthenticationFilter. There is a 
 workaround to set 
 yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. 
 This bug is to fix the code to use only the RMAuthenticationFilter and not 
 both.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


 [ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2308:
---

Attachment: jira2308.patch

patch updated according to Wangda's suggestion

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API

2014-08-11 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093199#comment-14093199
 ] 

Zhijie Shen commented on YARN-2277:
---

bq.  I have provided a minimal CORS filter that will give us an idea if this is 
the direction to go. Based on the direction of this patch, the scope has 
widened to create a general CrossOriginFilter for use within all Hadoop REST 
APIs. Probably, we will want to split the different pieces us across JIRAs, 
umbrella, Filter and FilterInitializer, additional configuration, and 
individual REST servers. This way we can focus on the end goal of getting Tez 
UI done in a timely manner without forgetting completeness of CORS support.
[~jeagles], thanks for your contribution!

+1 for making the minimal CORS filter. Another concern is that if we upgrade 
jetty sometime in the future, we can reuse the cross-origin filter provided by 
it, rebase this on top of it. One additional suggestion is that we can start 
the CORS filter even in smaller scope: the timeline server only, which means we 
should move the filter/filter initializer to this sub module. Once it is made 
robust enough and proved to be reliable, we can promote it to 
hadoop-yarn-common or even hadoop-common. How do you think?

Bellow are some detailed comments for the patch:

1. The prefix changes to yarn.timeline-service.http.cross-origin?
{code}
+  public static final String PREFIX = hadoop.http.filter.cross.origin.;
{code}

2. ALLOWED_ORIGINS - allowed-origins? Not to make the config name too long.
{code}
+  // Filter configuration
+  public static final String ALLOWED_ORIGINS = 
access.control.allowed.origins;
{code}

3. Should most of the methods in CrossOriginFilter be private?

4. Is it better to make it configurable as well? Why allowedMethods doesn't 
have PUT? In the doc: 
https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS, it seems 
that the headers can go beyond the following set.
{code}
+  void initializeAllowedMethods(FilterConfig filterConfig) {
+allowedMethods.add(GET);
+allowedMethods.add(POST);
+allowedMethods.add(HEAD);
+LOG.info(Allowed Methods:  + getAllowedMethodsHeader());
+  }
+
+  void initializeAllowedHeaders(FilterConfig filterConfig) {
+allowedHeaders.add(X-Requested-With);
+allowedHeaders.add(Content-Type);
+allowedHeaders.add(Accept);
+allowedHeaders.add(Origin);
+LOG.info(Allowed Headers:  + getAllowedHeadersHeader());
+  }
{code}

5. Should we include Access-Control-Max-Age?

6. Is it better to invoke doCrossFilter after chain.doFilter, in case devs are 
going to do something special in servlet with the res object directly?
{code}
+  @Override
+  public void doFilter(ServletRequest req, ServletResponse res, FilterChain 
chain)
+throws IOException, ServletException {
+doCrossFilter((HttpServletRequest) req, (HttpServletResponse) res);
+chain.doFilter(req, res);
+  }
{code}

 Add Cross-Origin support to the ATS REST API
 

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, 
 YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, 
 YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch


 As the Application Timeline Server is not provided with built-in UI, it may 
 make sense to enable JSONP or CORS Rest API capabilities to allow for remote 
 UI to access the data directly via javascript without cross side server 
 browser blocks coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093209#comment-14093209
 ] 

Larry McCay commented on YARN-2373:
---

You are absolutely right. I will have a new patch shortly.
Thanks, again!

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-11 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093217#comment-14093217
 ] 

Ashwin Shankar commented on YARN-2393:
--

[~ywskycn],
On skimming through the patch at high level, I had a quick comment. Apart from 
the configuration in alloc xml, queues can get created dynamically when one 
uses QueuePlacementRules like primary group, nested user queue etc. with 
create=true. Shouldn't we recompute static shares in these cases ?

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share

2014-08-11 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093222#comment-14093222
 ] 

Wei Yan commented on YARN-2393:
---

[~ashwinshankar77], thanks for the comment. Will check that.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-08-11 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093216#comment-14093216
 ] 

Craig Welch commented on YARN-1198:
---

So, I'm in the process of putting together a patch to calculate the headroom in 
more cases as described in this jira.  It strikes me that one of the changes 
called for is to change headroom to apply to the queue+user combination instead 
of to the application as it does today -  today, headroom is per application, 
as I understand the jira, the suggestion is to establish the same headroom 
value for a given user + queue combination and to change the headroom 
simultaneously for all applications for a user + queue any time the headroom 
would change for any of them.  This suggests that a reasonable approach might 
be to use the same resource instance for a given user+queue combination, 
instead of having it per application.  Thoughts? 

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1198.1.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


 [ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated YARN-2373:
--

Attachment: YARN-2373.patch

Attaching new patch to address the issue identified through [~jianhe]'s review. 
Thanks again, Jian!

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share


[ 
https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093258#comment-14093258
 ] 

Hadoop QA commented on YARN-2393:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661048/YARN-2393-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4592//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4592//console

This message is automatically generated.

 Fair Scheduler : Implement static fair share
 

 Key: YARN-2393
 URL: https://issues.apache.org/jira/browse/YARN-2393
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Ashwin Shankar
Assignee: Wei Yan
 Attachments: YARN-2393-1.patch


 Static fair share is a fair share allocation considering all(active/inactive) 
 queues.It would be shown on the UI for better predictability of finish time 
 of applications.
 We would compute static fair share only when needed, like on queue creation, 
 node added/removed. Please see YARN-2026 for discussions on this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093259#comment-14093259
 ] 

Hadoop QA commented on YARN-2308:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661044/jira2308.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4591//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4591//console

This message is automatically generated.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto


 [ 
https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2406:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-128

 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
 

 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He

 Today most recovery related proto records are defined in 
 yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
 module. Since these records are internally used by RM only, we can move them 
 to the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto

Jian He created YARN-2406:
-

 Summary: Move RM recovery related proto to 
yarn_server_resourcemanager_recovery.proto
 Key: YARN-2406
 URL: https://issues.apache.org/jira/browse/YARN-2406
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He


Today most recovery related proto records are defined in 
yarn_server_resourcemanager_service_protos.proto which is inside YARN-API 
module. Since these records are internally used by RM only, we can move them to 
the yarn_server_resourcemanager_recovery.proto file inside RM-server module



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications

2014-08-11 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2317:


Attachment: YARN-2317-081114.patch

Thanks [~zjshen]! I've addressed the points in your review. In general, this 
patch performs the following work on the How to write YARN application 
webpage:

1. Update the document with the latest clients, rather than the old protocols, 
since the protocol based approach is no longer encouraged. (Major change)

2. Replacing sample code with the code in the latest version of distributed 
shell. (Major change)

3. Update FAQ and useful links section with latest information (Minor change)

With regard to your comments, I've fixed all of them following your 
suggestions. Specially, to avoid confusion, I fixed issue 8 by directly 
removing the commented code. 

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


 [ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2308:
---

Attachment: jira2308.patch

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093351#comment-14093351
 ] 

Hudson commented on YARN-2138:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6048 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6048/])
YARN-2138. Cleaned up notifyDone* APIs in RMStateStore. Contributed by Varun 
Saxena (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617341)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptNewSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptUpdateSavedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


 Cleanup notifyDone* methods in RMStateStore
 ---

 Key: YARN-2138
 URL: https://issues.apache.org/jira/browse/YARN-2138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.6.0

 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, 
 YARN-2138.004.patch, YARN-2138.patch


 The storedException passed into notifyDoneStoringApplication is always null. 
 Similarly for other notifyDone* methods. We can clean up these methods as 
 this control flow path is not used anymore.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093349#comment-14093349
 ] 

Hudson commented on YARN-2400:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6048 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6048/])
YARN-2400: Addendum fix for TestAMRestart failure. Contributed by Jian He 
(xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617333)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 TestAMRestart fails intermittently
 --

 Key: YARN-2400
 URL: https://issues.apache.org/jira/browse/YARN-2400
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2240.2.patch, YARN-2400.1.patch


 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093459#comment-14093459
 ] 

Jian He commented on YARN-2373:
---

Larry, Thanks for the update.  But.. seems to be a bug again? should be 
getPassword(sslConf, WEB_APP_KEYSTORE_PASSWORD_KEY) ?
{code}
sslConf.get(getPassword(sslConf, WEB_APP_KEYSTORE_PASSWORD_KEY))
{code}
The test is testing CredentialProvider and the newly added helper API. can you 
add a test for loadSslConfiguration method ? 

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093463#comment-14093463
 ] 

Hadoop QA commented on YARN-2308:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661066/jira2308.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4593//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4593//console

This message is automatically generated.

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager

Yu Gao created YARN-2407:


 Summary: Users are not allowed to view their own jobs, denied by 
JobACLsManager
 Key: YARN-2407
 URL: https://issues.apache.org/jira/browse/YARN-2407
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.1
Reporter: Yu Gao


Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a 
non-admin user user1. The job could be finished successfully, but the running 
progress was not displayed correctly on the commad-line, and I got following in 
the corresponding ApplicationMaster log:
INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
handler 0 on 56717, call 
org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
9.30.95.26:61024 Call#59 Retry#0
org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
operation VIEW_JOB on job_1407456690588_0003
at 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
at 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
at 
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
at 
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at 
java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093469#comment-14093469
 ] 

Larry McCay commented on YARN-2373:
---

You are right again.
I am fixing it now and enhancing the test as you suggest.
Apologies.

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093484#comment-14093484
 ] 

Yu Gao commented on YARN-2407:
--

After turn on debug, got this in ApplicationMaster log:
DEBUG [IPC Server handler 0 on 36796] org.apache.hadoop.mapred.JobACLsManager: 
checkAccess job acls, jobOwner: yarn jobacl: VIEW_JOB user: user1

The jobOwner above is incorrect. It should be user1 since it was user1 who 
submitted the job.

This error is caused by an incorrect implementation in JobImpl, which has 
defined two 
user name fields:
username - user got from system property user.name, which is the container 
process owner
userName - the value is passed in via JobImpl constructor, which is the end 
user who has submitted the job
The JobImpl#checkAccess method should have used userName as the job owner, 
instead of username.

 Users are not allowed to view their own jobs, denied by JobACLsManager
 --

 Key: YARN-2407
 URL: https://issues.apache.org/jira/browse/YARN-2407
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.1
Reporter: Yu Gao

 Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as 
 a non-admin user user1. The job could be finished successfully, but the 
 running progress was not displayed correctly on the commad-line, and I got 
 following in the corresponding ApplicationMaster log:
 INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
 handler 0 on 56717, call 
 org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
 9.30.95.26:61024 Call#59 Retry#0
 org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
 operation VIEW_JOB on job_1407456690588_0003
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
   at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at 
 java.security.AccessController.doPrivileged(AccessController.java:366)
   at javax.security.auth.Subject.doAs(Subject.java:572)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager


 [ 
https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Gao updated YARN-2407:
-

Attachment: YARN-2407.patch

 Users are not allowed to view their own jobs, denied by JobACLsManager
 --

 Key: YARN-2407
 URL: https://issues.apache.org/jira/browse/YARN-2407
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.1
Reporter: Yu Gao
 Attachments: YARN-2407.patch


 Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as 
 a non-admin user user1. The job could be finished successfully, but the 
 running progress was not displayed correctly on the commad-line, and I got 
 following in the corresponding ApplicationMaster log:
 INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
 handler 0 on 56717, call 
 org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
 9.30.95.26:61024 Call#59 Retry#0
 org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
 operation VIEW_JOB on job_1407456690588_0003
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
   at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at 
 java.security.AccessController.doPrivileged(AccessController.java:366)
   at javax.security.auth.Subject.doAs(Subject.java:572)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager


 [ 
https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Gao updated YARN-2407:
-

Description: 
Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a 
non-admin user user1. The job could be finished successfully, but the running 
progress was not displayed correctly on the command-line, and I got following 
in the corresponding ApplicationMaster log:
INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
handler 0 on 56717, call 
org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
9.30.95.26:61024 Call#59 Retry#0
org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
operation VIEW_JOB on job_1407456690588_0003
at 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
at 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
at 
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
at 
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at 
java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)


  was:
Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a 
non-admin user user1. The job could be finished successfully, but the running 
progress was not displayed correctly on the commad-line, and I got following in 
the corresponding ApplicationMaster log:
INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
handler 0 on 56717, call 
org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
9.30.95.26:61024 Call#59 Retry#0
org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
operation VIEW_JOB on job_1407456690588_0003
at 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
at 
org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
at 
org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
at 
org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at 
java.security.AccessController.doPrivileged(AccessController.java:366)
at javax.security.auth.Subject.doAs(Subject.java:572)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



 Users are not allowed to view their own jobs, denied by JobACLsManager
 --

 Key: YARN-2407
 URL: https://issues.apache.org/jira/browse/YARN-2407
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.1
Reporter: Yu Gao
 Attachments: YARN-2407.patch


 Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as 
 a non-admin user user1. The job could be finished successfully, but the 
 running progress was not displayed correctly on the command-line, and I got 
 following in the corresponding ApplicationMaster log:
 INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
 handler 0 on 56717, call 
 org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
 9.30.95.26:61024 Call#59 Retry#0
 org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
 operation VIEW_JOB on job_1407456690588_0003
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)

[jira] [Updated] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


 [ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Larry McCay updated YARN-2373:
--

Attachment: YARN-2373.patch

Fixed issue found by [~jianhe] and added direct test of loadSslConfiguration. 
In order to test it, I had to add a new signature for loadSslConfiguration that 
accepts a Configuration instance for providing the provider.path configuration 
for the CredentialProvider API. I made this new signature public static as well 
- I figured it may make sense for some consumers to provider their own 
configuration. Let me know if you would rather it not be made public.

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager


[ 
https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093543#comment-14093543
 ] 

Hadoop QA commented on YARN-2407:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661090/YARN-2407.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

  org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4594//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4594//console

This message is automatically generated.

 Users are not allowed to view their own jobs, denied by JobACLsManager
 --

 Key: YARN-2407
 URL: https://issues.apache.org/jira/browse/YARN-2407
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.1
Reporter: Yu Gao
 Attachments: YARN-2407.patch


 Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as 
 a non-admin user user1. The job could be finished successfully, but the 
 running progress was not displayed correctly on the command-line, and I got 
 following in the corresponding ApplicationMaster log:
 INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server 
 handler 0 on 56717, call 
 org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 
 9.30.95.26:61024 Call#59 Retry#0
 org.apache.hadoop.security.AccessControlException: User user1 cannot perform 
 operation VIEW_JOB on job_1407456690588_0003
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233)
   at 
 org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122)
   at 
 org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at 
 java.security.AccessController.doPrivileged(AccessController.java:366)
   at javax.security.auth.Subject.doAs(Subject.java:572)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected


[ 
https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093563#comment-14093563
 ] 

Jason Lowe commented on YARN-1198:
--

I think having a per-user-per-queue headroom computation and reusing it between 
applications for that user in that queue makes sense.  I don't know of a case 
where the headroom of one app for a user in a queue should be computed 
differently than another app for the same user in the same queue.

 Capacity Scheduler headroom calculation does not work as expected
 -

 Key: YARN-1198
 URL: https://issues.apache.org/jira/browse/YARN-1198
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1198.1.patch


 Today headroom calculation (for the app) takes place only when
 * New node is added/removed from the cluster
 * New container is getting assigned to the application.
 However there are potentially lot of situations which are not considered for 
 this calculation
 * If a container finishes then headroom for that application will change and 
 should be notified to the AM accordingly.
 * If a single user has submitted multiple applications (app1 and app2) to the 
 same queue then
 ** If app1's container finishes then not only app1's but also app2's AM 
 should be notified about the change in headroom.
 ** Similarly if a container is assigned to any applications app1/app2 then 
 both AM should be notified about their headroom.
 ** To simplify the whole communication process it is ideal to keep headroom 
 per User per LeafQueue so that everyone gets the same picture (apps belonging 
 to same user and submitted in same queue).
 * If a new user submits an application to the queue then all applications 
 submitted by all users in that queue should be notified of the headroom 
 change.
 * Also today headroom is an absolute number ( I think it should be normalized 
 but then this is going to be not backward compatible..)
 * Also  when admin user refreshes queue headroom has to be updated.
 These all are the potential bugs in headroom calculations



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-11 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093592#comment-14093592
 ] 

Sandy Ryza commented on YARN-2399:
--

I noticed in FSAppAttempt there are some instance variables mixed in with the 
functions.  Not sure if it was like this already, but can we move them up to 
the top?

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications


[ 
https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093598#comment-14093598
 ] 

Hadoop QA commented on YARN-2317:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12661059/YARN-2317-081114.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4595//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4595//console

This message is automatically generated.

 Update documentation about how to write YARN applications
 -

 Key: YARN-2317
 URL: https://issues.apache.org/jira/browse/YARN-2317
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Li Lu
Assignee: Li Lu
 Fix For: 2.6.0

 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, 
 YARN-2317-073014.patch, YARN-2317-081114.patch


 Some information in WritingYarnApplications webpage is out-dated. Need some 
 refresh work on this document to reflect the most recent changes in YARN 
 APIs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt

2014-08-11 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093597#comment-14093597
 ] 

Sandy Ryza commented on YARN-2399:
--

Also, can we move all the methods that implement methods in Schedulable 
together?

{code}
+  // TODO (KK): Rename these
{code}
Rename these?

{code}
-new 
ConcurrentHashMapApplicationId,SchedulerApplicationFSSchedulerApp();
+new 
ConcurrentHashMapApplicationId,SchedulerApplicationFSAppAttempt();
{code}
Mind adding a space here after ApplicationId because you're fixing this line 
anyway?

{code}
+  private FSAppAttempt mockAppSched(long startTime) {
+FSAppAttempt schedApp = mock(FSAppAttempt.class);
+when(schedApp.getStartTime()).thenReturn(startTime);
+return schedApp;
   }
{code}
Call this mockAppAttempt?

Otherwise, LGTM

 FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
 

 Key: YARN-2399
 URL: https://issues.apache.org/jira/browse/YARN-2399
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-2399-1.patch, yarn-2399-2.patch


 FairScheduler has two data structures for an application, making the code 
 hard to track. We should merge these for better maintainability in the 
 long-term. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords


[ 
https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093626#comment-14093626
 ] 

Hadoop QA commented on YARN-2373:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661096/YARN-2373.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4596//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4596//console

This message is automatically generated.

 WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
 

 Key: YARN-2373
 URL: https://issues.apache.org/jira/browse/YARN-2373
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Larry McCay
 Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, 
 YARN-2373.patch, YARN-2373.patch


 As part of HADOOP-10904, this jira represents a change to WebAppUtils to 
 uptake the use of the credential provider API through the new method on 
 Configuration called getPassword.
 This provides an alternative to storing the passwords in clear text within 
 the ssl-server.xml file while maintaining backward compatibility with that 
 behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (YARN-1337) Recover containers upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093640#comment-14093640
 ] 

Jason Lowe edited comment on YARN-1337 at 8/12/14 1:54 AM:
---

Thanks for taking another look, Junping.

bq. Better to add javadoc for new added (or move from private) public method.

I documented all of the NodeStatusUpdater methods and also the 
NMStateStoreService public methods that didn't already have javadocs.

bq. volatile is unncessary as it was using AtomicBoolean already.

Fixed.


was (Author: jlowe):
Thanks for taking another look, Junping.

.bq Better to add javadoc for new added (or move from private) public method.

I documented all of the NodeStatusUpdater methods and also the 
NMStateStoreService public methods that didn't already have javadocs.

.bq volatile is unncessary as it was using AtomicBoolean already.

Fixed.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, 
 YARN-1337-v3.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1337) Recover containers upon nodemanager restart

2014-08-11 Thread Subramaniam Venkatraman Krishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1337:
-

Attachment: YARN-1337-v3.patch

Thanks for taking another look, Junping.

.bq Better to add javadoc for new added (or move from private) public method.

I documented all of the NodeStatusUpdater methods and also the 
NMStateStoreService public methods that didn't already have javadocs.

.bq volatile is unncessary as it was using AtomicBoolean already.

Fixed.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, 
 YARN-1337-v3.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subramaniam Venkatraman Krishnan updated YARN-2378:
---

Attachment: YARN-2378.patch

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-11 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093648#comment-14093648
 ] 

Subramaniam Venkatraman Krishnan commented on YARN-2378:


Thanks [~vvasudev] for resolving the host issue. The only test case that failed 
- TestAMRestart passes consistently for me.

Thanks for your feedback [~leftnoteasy]. I am uploading a new patch that 
addresses all your comments. 

Additionally based on our offline discussion and comments in YARN-807, I have 
added pending apps also in CapacityScheduler#getAppsInQueue() and refactored 
moveAllApps into AbstractYarnScheduler. I ran all the relevant test cases and 
things look good.

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade

[
https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093655#comment-14093655
]

Junping Du commented on YARN-2331:
--

[~jlowe], for rollup when NM is not supervised, I think another way is to add a
command line in RM Admin to bring down specific NM without killing containers
(by notifying RMNode and heartbeat back) given no admin port to NM so far. The
NM services shutdown (no matter decommission or failed occasionally) without
supervised won't trigger this CLI so won't preserve running containers.
Thoughts?

Distinguish shutdown during supervision vs. shutdown for rolling upgrade

Key: YARN-2331
URL: https://issues.apache.org/jira/browse/YARN-2331
Project: Hadoop YARN
Issue Type: Sub-task
Components: nodemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe

When the NM is shutting down with restart support enabled there are scenarios
we'd like to distinguish and behave accordingly:
# The NM is running under supervision. In that case containers should be
preserved so the automatic restart can recover them.
# The NM is not running under supervision and a rolling upgrade is not being
performed. In that case the shutdown should kill all containers since it is
unlikely the NM will be restarted in a timely manner to recover them.
# The NM is not running under supervision and a rolling upgrade is being
performed. In that case the shutdown should not kill all containers since a
restart is imminent due to the rolling upgrade and the containers will be
recovered.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store

2014-08-11 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2033:
--

Attachment: YARN-2033.5.patch

Rebase the patch according to the latest trunk, make 
AppicationHistoryManagerOnTimelineStore throw NotFoundException instead of 
returning null, to be consistent with the existing behavior. The patch also 
includes some minor improvement.

 Investigate merging generic-history into the Timeline Store
 ---

 Key: YARN-2033
 URL: https://issues.apache.org/jira/browse/YARN-2033
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, 
 YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, 
 YARN-2033.5.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, 
 YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch


 Having two different stores isn't amicable to generic insights on what's 
 happening with applications. This is to investigate porting generic-history 
 into the Timeline Store.
 One goal is to try and retain most of the client side interfaces as close to 
 what we have today.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093684#comment-14093684
 ] 

Junping Du commented on YARN-1337:
--

Latest patch looks good to me. +1 pending on Jenkins' test.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, 
 YARN-1337-v3.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093693#comment-14093693
 ] 

Hadoop QA commented on YARN-2378:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661117/YARN-2378.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4597//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4597//console

This message is automatically generated.

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Synchronization among Mappers in map-reduce task

2014-08-11 Thread saurabh jain

 Hi Folks ,

 I have been writing a map-reduce application where I am having an input
 file containing records and every field in the record is separated by some
 delimiter.

 In addition to this user will also provide a list of columns that he wants
 to lookup in a master properties file (stored in HDFS). If this columns
 (lets say it a key) is present in master properties file then get the
 corresponding value and update the key with this value and if the key is
 not present it in the master properties file then it will create a new
 value for this key and will write to this property file and will also
 update in the record.

 I have written this application , tested it and everything worked fine
 till now.

 *e.g :* *I/P Record :* This | is | the | test | record

 *Columns :* 2,4 (that means code will look up only field *is and test* in
 the master properties file.)

 Here , I have a question.

 *Q 1:* In the case when my input file is huge and it is splitted across
 the multiple mappers , I was getting the below mentioned exception where
 all the other mappers tasks were failing. *Also initially when I started
 the job my master properties file is empty.* In my code I have a check if
 this file (master properties) doesn't exist create a new empty file before
 submitting the job itself.

 e.g : If i have 4 splits of data , then 3 map tasks are failing. But after
 this all the failed map tasks restarts and finally the job become
 successful.

 So , *here is the question , is it possible to make sure that when one of
 the mapper tasks is writing to a file , other should wait until the first
 one is finished. ?* I read that all the mappers task don't interact with
 each other.

 Also what will happen in the scenario when I start multiple parallel
 map-reduce jobs and all of them working on the same properties files. *Is
 there any way to have synchronization between two independent map reduce
 jobs*?

 I have also read that ZooKeeper can be used in such scenarios , Is that
 correct ?


 Error: 
 com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException: 
 IOException - failed while appending data to the file -Failed to create file 
 [/user/cloudera/lob/master/bank.properties] for 
 [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client 
 [10.X.X.17], because this file is already being created by
 [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on [10.X.X.17]
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-08-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093695#comment-14093695
 ] 

Wangda Tan commented on YARN-2378:
--

[~subru],
I've ran the previous failed test locally, it passed. And as same as the latest 
Jenkins result. 
I think LGTM, +1.
[~jianhe], would you like to take a look at this?

Thanks,
Wangda

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subramaniam Venkatraman Krishnan
Assignee: Subramaniam Venkatraman Krishnan
  Labels: capacity-scheduler
 Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-08-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093696#comment-14093696
 ] 

Wangda Tan commented on YARN-415:
-

[~jianhe], would you like to take a look at it?

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
 Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
 YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
 YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
 YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
 YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
 YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
 YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
 YARN-415.201407242148.txt, YARN-415.201407281816.txt, 
 YARN-415.201408062232.txt, YARN-415.201408080204.txt, 
 YARN-415.201408092006.txt, YARN-415.patch


 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed

2014-08-11 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093714#comment-14093714
 ] 

Wangda Tan commented on YARN-2308:
--

[~lichangleo],
Thanks for updating, I think following line is not necessary
bq. +conf.setBoolean(YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED, 
true);
I just tried in my local, remove it should be fine. Besides this, LGTM, +1.

[~zjshen], do you have take a look at this?

Thanks,
Wangda

 NPE happened when RM restart after CapacityScheduler queue configuration 
 changed 
 -

 Key: YARN-2308
 URL: https://issues.apache.org/jira/browse/YARN-2308
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: chang li
Priority: Critical
 Attachments: jira2308.patch, jira2308.patch, jira2308.patch


 I encountered a NPE when RM restart
 {code}
 2014-07-16 07:22:46,957 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ATTEMPT_ADDED to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 And RM will be failed to restart.
 This is caused by queue configuration changed, I removed some queues and 
 added new queues. So when RM restarts, it tries to recover history 
 applications, and when any of queues of these applications removed, NPE will 
 be raised.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2408) Resource Request REST API for YARN

2014-08-11 Thread Renan DelValle (JIRA)

Renan DelValle created YARN-2408:


 Summary: Resource Request REST API for YARN
 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
Priority: Minor


I’m proposing a new REST API for YARN which exposes a snapshot of the Resource 
Requests that exist inside of the Scheduler. My motivation behind this new 
feature is to allow external software to monitor the amount of resources being 
requested to gain more insightful information into cluster usage than is 
already provided. The API can also be used by external software to detect a 
starved application and alert the appropriate users and/or sys admin so that 
the problem may be remedied.

Here is the proposed API:
{code:xml}
resourceRequests
  MB96256/MB
  VCores94/VCores
  appMaster
applicationIdapplication_/applicationId
applicationAttemptIdappattempt_/applicationAttemptId
queueNamedefault/queueName
totalPendingMB96256/totalPendingMB
totalPendingVCores94/totalPendingVCores
numResourceRequests3/numResourceRequests
resourceRequests
  request
MB1024/MB
VCores1/VCores
resourceName/default-rack/resourceName
numContainers94/numContainers
relaxLocalitytrue/relaxLocality
priority20/priority
  /request
  request
MB1024/MB
VCores1/VCores
resourceName*/resourceName
numContainers94/numContainers
relaxLocalitytrue/relaxLocality
priority20/priority
  /request
  request
MB1024/MB
VCores1/VCores
resourceNamemaster/resourceName
numContainers94/numContainers
relaxLocalitytrue/relaxLocality
priority20/priority
  /request
/resourceRequests
  /appMaster
/resourceRequests
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2408) Resource Request REST API for YARN

2014-08-11 Thread Renan DelValle (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated YARN-2408:
-

Attachment: YARN-2408.patch

 Resource Request REST API for YARN
 --

 Key: YARN-2408
 URL: https://issues.apache.org/jira/browse/YARN-2408
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: webapp
Reporter: Renan DelValle
Priority: Minor
  Labels: features
 Attachments: YARN-2408.patch


 I’m proposing a new REST API for YARN which exposes a snapshot of the 
 Resource Requests that exist inside of the Scheduler. My motivation behind 
 this new feature is to allow external software to monitor the amount of 
 resources being requested to gain more insightful information into cluster 
 usage than is already provided. The API can also be used by external software 
 to detect a starved application and alert the appropriate users and/or sys 
 admin so that the problem may be remedied.
 Here is the proposed API:
 {code:xml}
 resourceRequests
   MB96256/MB
   VCores94/VCores
   appMaster
 applicationIdapplication_/applicationId
 applicationAttemptIdappattempt_/applicationAttemptId
 queueNamedefault/queueName
 totalPendingMB96256/totalPendingMB
 totalPendingVCores94/totalPendingVCores
 numResourceRequests3/numResourceRequests
 resourceRequests
   request
 MB1024/MB
 VCores1/VCores
 resourceName/default-rack/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
   request
 MB1024/MB
 VCores1/VCores
 resourceName*/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
   request
 MB1024/MB
 VCores1/VCores
 resourceNamemaster/resourceName
 numContainers94/numContainers
 relaxLocalitytrue/relaxLocality
 priority20/priority
   /request
 /resourceRequests
   /appMaster
 /resourceRequests
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart


[ 
https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093733#comment-14093733
 ] 

Hadoop QA commented on YARN-1337:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12661113/YARN-1337-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4598//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4598//console

This message is automatically generated.

 Recover containers upon nodemanager restart
 ---

 Key: YARN-1337
 URL: https://issues.apache.org/jira/browse/YARN-1337
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, 
 YARN-1337-v3.patch


 To support work-preserving NM restart we need to recover the state of the 
 containers when the nodemanager went down.  This includes informing the RM of 
 containers that have exited in the interim and a strategy for dealing with 
 the exit codes from those containers along with how to reacquire the active 
 containers and determine their exit codes when they terminate.  The state of 
 finished containers also needs to be recovered.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store