[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932908#comment-13932908
 ] 

Hadoop QA commented on YARN-1591:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634353/YARN-1591.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3343//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3343//console

This message is automatically generated.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-03-13 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-90:
--

Attachment: apache-yarn-90.2.patch

Fixed issue that caused the patch application to fail.

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.2.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932949#comment-13932949
 ] 

Hadoop QA commented on YARN-1389:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634349/YARN-1389.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3342//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3342//console

This message is automatically generated.

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, 
 YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932950#comment-13932950
 ] 

Zhijie Shen commented on YARN-1389:
---

Test failures are not related.

 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, 
 YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932957#comment-13932957
 ] 

Zhijie Shen commented on YARN-1577:
---

Hi, Naren. YARN-1389 is resolved you can make use of 
ApplicationClientProtocol#getApplicationAttemptReport to get the attempt state.

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Naren Koneru
Priority: Blocker

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932960#comment-13932960
 ] 

Hudson commented on YARN-1389:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5316 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5316/])
YARN-1389. Made ApplicationClientProtocol and ApplicationHistoryProtocol expose 
analogous getApplication(s)/Attempt(s)/Container(s) APIs. Contributed by Mayank 
Bansal. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577052)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, 
 YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-90) NodeManager should identify failed disks becoming good back again

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932976#comment-13932976
 ] 

Hadoop QA commented on YARN-90:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12634358/apache-yarn-90.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3344//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3344//console

This message is automatically generated.

 NodeManager should identify failed disks becoming good back again
 -

 Key: YARN-90
 URL: https://issues.apache.org/jira/browse/YARN-90
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Ravi Gummadi
Assignee: Varun Vasudev
 Attachments: YARN-90.1.patch, YARN-90.patch, YARN-90.patch, 
 YARN-90.patch, YARN-90.patch, apache-yarn-90.0.patch, apache-yarn-90.1.patch, 
 apache-yarn-90.2.patch


 MAPREDUCE-3121 makes NodeManager identify disk failures. But once a disk goes 
 down, it is marked as failed forever. To reuse that disk (after it becomes 
 good), NodeManager needs restart. This JIRA is to improve NodeManager to 
 reuse good disks(which could be bad some time back).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1705) Cluster metrics are off after failover

2014-03-13 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1705:
-

Attachment: YARN-1705.1.patch

Hi, 
 I attached patch that handles  
   1. transtion Active-StandBy-Active. Basically, clearing off 
cache(cluster metrics and queue metrics).

I remain with one open point that should cluster metrics take care of recovered 
applications(Finished,Killed and Failed)? :-(

Please give your suggestions.

 Cluster metrics are off after failover
 --

 Key: YARN-1705
 URL: https://issues.apache.org/jira/browse/YARN-1705
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Rohith
 Attachments: YARN-1705.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration

2014-03-13 Thread PengZhang (JIRA)
PengZhang created YARN-1829:
---

 Summary: CapacityScheduler can't schedule job after 
misconfiguration
 Key: YARN-1829
 URL: https://issues.apache.org/jira/browse/YARN-1829
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: PengZhang


CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
\
 X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration

2014-03-13 Thread PengZhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengZhang updated YARN-1829:


Description: 
CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
nbspnbsp  \
nbspnbsp  X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)

  was:
CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
\
 X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)


 CapacityScheduler can't schedule job after misconfiguration
 ---

 Key: YARN-1829
 URL: https://issues.apache.org/jira/browse/YARN-1829
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: PengZhang

 CapacityScheduler will validate new configuration to make sure all existing 
 queues are still present. But it seems not enough:
 1.When we change one queue(name A) from leaf to parent, it will pass 
 validation and add it's new child(X) to queues. And later root.reinitialize() 
 will fail because of queue type has changed.
 2.Then we add new parent queue(name B) with children(X), and change 
 queue(A)'s state to STOPPED. This will apply successfully. but job submitted 
 to queue(X) can never be scheduled. Because LeafQueue(X) has already been 
 added in phase 1, and it's parent points to A which is STOPPED. 
  root   
  /   
 A 
 queues: root, A
   root  
   /
  A
 /
 X
 reinitialize failed, but X is added to queues
 queues: root, A, X
   root 
   / \
  A   B
 nbspnbsp  \
 nbspnbsp  X
 new node X will not replace old one
 queues: root, A, X(value is not LeafQueue that in the tree)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration

2014-03-13 Thread PengZhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengZhang updated YARN-1829:


Description: 
CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
\
 X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)

  was:
CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
nbspnbsp  \
nbspnbsp  X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)


 CapacityScheduler can't schedule job after misconfiguration
 ---

 Key: YARN-1829
 URL: https://issues.apache.org/jira/browse/YARN-1829
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: PengZhang

 CapacityScheduler will validate new configuration to make sure all existing 
 queues are still present. But it seems not enough:
 1.When we change one queue(name A) from leaf to parent, it will pass 
 validation and add it's new child(X) to queues. And later root.reinitialize() 
 will fail because of queue type has changed.
 2.Then we add new parent queue(name B) with children(X), and change 
 queue(A)'s state to STOPPED. This will apply successfully. but job submitted 
 to queue(X) can never be scheduled. Because LeafQueue(X) has already been 
 added in phase 1, and it's parent points to A which is STOPPED. 
  root   
  /   
 A 
 queues: root, A
   root  
   /
  A
 /
 X
 reinitialize failed, but X is added to queues
 queues: root, A, X
   root 
   / \
  A   B
 \
  X
 new node X will not replace old one
 queues: root, A, X(value is not LeafQueue that in the tree)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1829) CapacityScheduler can't schedule job after misconfiguration

2014-03-13 Thread PengZhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PengZhang updated YARN-1829:


Description: 
CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

{code}
 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
  \
   X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)
{code}

  was:
CapacityScheduler will validate new configuration to make sure all existing 
queues are still present. But it seems not enough:
1.When we change one queue(name A) from leaf to parent, it will pass validation 
and add it's new child(X) to queues. And later root.reinitialize() will fail 
because of queue type has changed.
2.Then we add new parent queue(name B) with children(X), and change queue(A)'s 
state to STOPPED. This will apply successfully. but job submitted to queue(X) 
can never be scheduled. Because LeafQueue(X) has already been added in phase 1, 
and it's parent points to A which is STOPPED. 

 root   
 /   
A 
queues: root, A


  root  
  /
 A
/
X
reinitialize failed, but X is added to queues
queues: root, A, X

  root 
  / \
 A   B
\
 X
new node X will not replace old one
queues: root, A, X(value is not LeafQueue that in the tree)


 CapacityScheduler can't schedule job after misconfiguration
 ---

 Key: YARN-1829
 URL: https://issues.apache.org/jira/browse/YARN-1829
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: PengZhang

 CapacityScheduler will validate new configuration to make sure all existing 
 queues are still present. But it seems not enough:
 1.When we change one queue(name A) from leaf to parent, it will pass 
 validation and add it's new child(X) to queues. And later root.reinitialize() 
 will fail because of queue type has changed.
 2.Then we add new parent queue(name B) with children(X), and change 
 queue(A)'s state to STOPPED. This will apply successfully. but job submitted 
 to queue(X) can never be scheduled. Because LeafQueue(X) has already been 
 added in phase 1, and it's parent points to A which is STOPPED. 
 {code}
  root   
  /   
 A 
 queues: root, A
   root  
   /
  A
 /
 X
 reinitialize failed, but X is added to queues
 queues: root, A, X
   root 
   / \
  A   B
   \
X
 new node X will not replace old one
 queues: root, A, X(value is not LeafQueue that in the tree)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster

2014-03-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933059#comment-13933059
 ] 

Steve Loughran commented on YARN-1824:
--

My main point was : client code should not have to know the value of 
{{yarn.application.classpath}} as it is something that YARN itself knows, and 
which a client can only get wrong

to quote the Distributed Shell client
{code}
   // At some point we should not be required to add 
// the hadoop specific classpaths to the env. 
// It should be provided out of the box. 
// For now setting all required classpaths including
// the classpath to . for the application jar
{code}

If there was an env variable YARN_APPLICATION_LIB which you could use when 
setting up a classpath, most of the pain in setting up a yarn AM classpath 
would be avoided

 Make Windows client work with Linux/Unix cluster
 

 Key: YARN-1824
 URL: https://issues.apache.org/jira/browse/YARN-1824
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1824.1.patch, YARN-1824.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1542) Add unit test for public resource on viewfs

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933095#comment-13933095
 ] 

Hadoop QA commented on YARN-1542:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12631699/YARN-1542.v03.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.util.TestFSDownload

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3345//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3345//console

This message is automatically generated.

 Add unit test for public resource on viewfs
 ---

 Key: YARN-1542
 URL: https://issues.apache.org/jira/browse/YARN-1542
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: YARN-1542.v01.patch, YARN-1542.v02.patch, 
 YARN-1542.v03.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1812) Job stays in PREP state for long time after RM Restarts

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933106#comment-13933106
 ] 

Hudson commented on YARN-1812:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/508/])
YARN-1812. Fixed ResourceManager to synchrously renew tokens after recovery and 
thus recover app itself synchronously and avoid races with resyncing 
NodeManagers. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576843)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 Job stays in PREP state for long time after RM Restarts
 ---

 Key: YARN-1812
 URL: https://issues.apache.org/jira/browse/YARN-1812
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1812.1.patch, YARN-1812.2.patch, YARN-1812.3.patch


 Steps followed:
 1) start a sort job with 80 maps and 5 reducers
 2) restart Resource manager when 60 maps and 0 reducers are finished
 3) Wait for job to come out of PREP state.
 The job does not come out of PREP state after 7-8 mins.
 After waiting for 7-8 mins, test kills the job.
 However, Sort job should not take this long time to come out of PREP state



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1816) Succeeded application remains in accepted after RM restart

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933109#comment-13933109
 ] 

Hudson commented on YARN-1816:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/508/])
YARN-1816. Fixed ResourceManager to get RMApp correctly handle ATTEMPT_FINISHED 
event at ACCEPTED state that can happen after RM restarts. Contributed by Jian 
He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576911)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Succeeded application remains in accepted after RM restart
 --

 Key: YARN-1816
 URL: https://issues.apache.org/jira/browse/YARN-1816
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1816.1.patch


 {code}
 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 

[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933103#comment-13933103
 ] 

Hudson commented on YARN-1789:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/508/])
YARN-1789. ApplicationSummary does not escape newlines in the app name. 
Contributed by Tsuyoshi OZAWA (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576960)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java


 ApplicationSummary does not escape newlines in the app name
 ---

 Key: YARN-1789
 URL: https://issues.apache.org/jira/browse/YARN-1789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1789.1.patch


 YARN-side of MAPREDUCE-5778.
 ApplicationSummary is not escaping newlines in the app name. This can result 
 in an application summary log entry that spans multiple lines when users are 
 expecting one-app-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1444) RM crashes when node resource request sent without corresponding off-switch request

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933102#comment-13933102
 ] 

Hudson commented on YARN-1444:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/508/])
YARN-1444. Fix CapacityScheduler to deal with cases where applications specify 
host/rack requests without off-switch request. Contributed by Wangda Tan. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576751)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 RM crashes when node resource request sent without corresponding off-switch 
 request
 ---

 Key: YARN-1444
 URL: https://issues.apache.org/jira/browse/YARN-1444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Robert Grandl
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.4.0

 Attachments: yarn-1444.ver1.patch, yarn-1444.ver2.patch


 I have tried to force reducers to execute on certain nodes. What I did is I 
 changed for reduce tasks, the 
 RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, 
 req.capability) to RMContainerRequestor#addResourceRequest(req.priority, 
 HOST_NAME, req.capability). 
 However, this change lead to RM crashes when reducers needs to be assigned 
 with the following exception:
 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933111#comment-13933111
 ] 

Hudson commented on YARN-1389:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #508 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/508/])
YARN-1389. Made ApplicationClientProtocol and ApplicationHistoryProtocol expose 
analogous getApplication(s)/Attempt(s)/Container(s) APIs. Contributed by Mayank 
Bansal. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577052)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, 
 YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1816) Succeeded application remains in accepted after RM restart

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933247#comment-13933247
 ] 

Hudson commented on YARN-1816:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/])
YARN-1816. Fixed ResourceManager to get RMApp correctly handle ATTEMPT_FINISHED 
event at ACCEPTED state that can happen after RM restarts. Contributed by Jian 
He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576911)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Succeeded application remains in accepted after RM restart
 --

 Key: YARN-1816
 URL: https://issues.apache.org/jira/browse/YARN-1816
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1816.1.patch


 {code}
 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 

[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933241#comment-13933241
 ] 

Hudson commented on YARN-1789:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/])
YARN-1789. ApplicationSummary does not escape newlines in the app name. 
Contributed by Tsuyoshi OZAWA (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576960)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java


 ApplicationSummary does not escape newlines in the app name
 ---

 Key: YARN-1789
 URL: https://issues.apache.org/jira/browse/YARN-1789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1789.1.patch


 YARN-side of MAPREDUCE-5778.
 ApplicationSummary is not escaping newlines in the app name. This can result 
 in an application summary log entry that spans multiple lines when users are 
 expecting one-app-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1812) Job stays in PREP state for long time after RM Restarts

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933244#comment-13933244
 ] 

Hudson commented on YARN-1812:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/])
YARN-1812. Fixed ResourceManager to synchrously renew tokens after recovery and 
thus recover app itself synchronously and avoid races with resyncing 
NodeManagers. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576843)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 Job stays in PREP state for long time after RM Restarts
 ---

 Key: YARN-1812
 URL: https://issues.apache.org/jira/browse/YARN-1812
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1812.1.patch, YARN-1812.2.patch, YARN-1812.3.patch


 Steps followed:
 1) start a sort job with 80 maps and 5 reducers
 2) restart Resource manager when 60 maps and 0 reducers are finished
 3) Wait for job to come out of PREP state.
 The job does not come out of PREP state after 7-8 mins.
 After waiting for 7-8 mins, test kills the job.
 However, Sort job should not take this long time to come out of PREP state



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1444) RM crashes when node resource request sent without corresponding off-switch request

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933240#comment-13933240
 ] 

Hudson commented on YARN-1444:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1700 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1700/])
YARN-1444. Fix CapacityScheduler to deal with cases where applications specify 
host/rack requests without off-switch request. Contributed by Wangda Tan. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576751)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 RM crashes when node resource request sent without corresponding off-switch 
 request
 ---

 Key: YARN-1444
 URL: https://issues.apache.org/jira/browse/YARN-1444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Robert Grandl
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.4.0

 Attachments: yarn-1444.ver1.patch, yarn-1444.ver2.patch


 I have tried to force reducers to execute on certain nodes. What I did is I 
 changed for reduce tasks, the 
 RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, 
 req.capability) to RMContainerRequestor#addResourceRequest(req.priority, 
 HOST_NAME, req.capability). 
 However, this change lead to RM crashes when reducers needs to be assigned 
 with the following exception:
 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1591:
-

Attachment: YARN-1591.3.patch

Fixed not to throw YarnRuntimeException when InterruptedException is thrown in 
EventDispatcher#handle.
IIUC, throwing YarnRuntimeException in EventDispatcher#handle is not handled in 
AsyncDispatcher and leads needless crash. We should exit the thread gracefully 
in that case.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1444) RM crashes when node resource request sent without corresponding off-switch request

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933341#comment-13933341
 ] 

Hudson commented on YARN-1444:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/])
YARN-1444. Fix CapacityScheduler to deal with cases where applications specify 
host/rack requests without off-switch request. Contributed by Wangda Tan. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576751)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestFifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 RM crashes when node resource request sent without corresponding off-switch 
 request
 ---

 Key: YARN-1444
 URL: https://issues.apache.org/jira/browse/YARN-1444
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client, resourcemanager
Reporter: Robert Grandl
Assignee: Wangda Tan
Priority: Blocker
 Fix For: 2.4.0

 Attachments: yarn-1444.ver1.patch, yarn-1444.ver2.patch


 I have tried to force reducers to execute on certain nodes. What I did is I 
 changed for reduce tasks, the 
 RMContainerRequestor#addResourceRequest(req.priority, ResourceRequest.ANY, 
 req.capability) to RMContainerRequestor#addResourceRequest(req.priority, 
 HOST_NAME, req.capability). 
 However, this change lead to RM crashes when reducers needs to be assigned 
 with the following exception:
 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type NODE_UPDATE to the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:841)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:640)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:554)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:695)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:739)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:549)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1816) Succeeded application remains in accepted after RM restart

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933348#comment-13933348
 ] 

Hudson commented on YARN-1816:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/])
YARN-1816. Fixed ResourceManager to get RMApp correctly handle ATTEMPT_FINISHED 
event at ACCEPTED state that can happen after RM restarts. Contributed by Jian 
He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576911)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 Succeeded application remains in accepted after RM restart
 --

 Key: YARN-1816
 URL: https://issues.apache.org/jira/browse/YARN-1816
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1816.1.patch


 {code}
 2014-03-10 18:07:31,944|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:07:31,945|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:08:02,125|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:08:03,198|beaver.machine|INFO|14/03/10 18:08:03 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:08:03,238|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:08:03,239|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:08:03,239|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:08:33,390|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:08:34,437|beaver.machine|INFO|14/03/10 18:08:34 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:08:34,477|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:08:34,477|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 
Tracking-URL
 2014-03-10 18:08:34,478|beaver.machine|INFO|application_1394449508064_0008
 test_mapred_ha_multiple_job_nn-rm-1-min-5-jobs_1394449960-4
 MAPREDUCEhrt_qa defaultACCEPTED   
 SUCCEEDED 100% 
 http://hostname:19888/jobhistory/job/job_1394449508064_0008
 2014-03-10 18:09:04,628|beaver.machine|INFO|RUNNING: /usr/bin/yarn 
 application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
 2014-03-10 18:09:05,688|beaver.machine|INFO|14/03/10 18:09:05 INFO 
 client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
 2014-03-10 18:09:05,728|beaver.machine|INFO|Total number of applications 
 (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, 
 RUNNING]):1
 2014-03-10 18:09:05,728|beaver.machine|INFO|Application-Id
 Application-NameApplication-Type  User   Queue
State Final-State Progress 

[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933355#comment-13933355
 ] 

Hadoop QA commented on YARN-1591:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634436/YARN-1591.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3346//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3346//console

This message is automatically generated.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1812) Job stays in PREP state for long time after RM Restarts

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933345#comment-13933345
 ] 

Hudson commented on YARN-1812:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/])
YARN-1812. Fixed ResourceManager to synchrously renew tokens after recovery and 
thus recover app itself synchronously and avoid races with resyncing 
NodeManagers. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576843)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMHATestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java


 Job stays in PREP state for long time after RM Restarts
 ---

 Key: YARN-1812
 URL: https://issues.apache.org/jira/browse/YARN-1812
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Yesha Vora
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1812.1.patch, YARN-1812.2.patch, YARN-1812.3.patch


 Steps followed:
 1) start a sort job with 80 maps and 5 reducers
 2) restart Resource manager when 60 maps and 0 reducers are finished
 3) Wait for job to come out of PREP state.
 The job does not come out of PREP state after 7-8 mins.
 After waiting for 7-8 mins, test kills the job.
 However, Sort job should not take this long time to come out of PREP state



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1789) ApplicationSummary does not escape newlines in the app name

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933342#comment-13933342
 ] 

Hudson commented on YARN-1789:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/])
YARN-1789. ApplicationSummary does not escape newlines in the app name. 
Contributed by Tsuyoshi OZAWA (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576960)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestAppManager.java


 ApplicationSummary does not escape newlines in the app name
 ---

 Key: YARN-1789
 URL: https://issues.apache.org/jira/browse/YARN-1789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: newbie
 Fix For: 2.4.0

 Attachments: YARN-1789.1.patch


 YARN-side of MAPREDUCE-5778.
 ApplicationSummary is not escaping newlines in the app name. This can result 
 in an application summary log entry that spans multiple lines when users are 
 expecting one-app-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1389) ApplicationClientProtocol and ApplicationHistoryProtocol should expose analogous APIs

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933350#comment-13933350
 ] 

Hudson commented on YARN-1389:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1725 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1725/])
YARN-1389. Made ApplicationClientProtocol and ApplicationHistoryProtocol expose 
analogous getApplication(s)/Attempt(s)/Container(s) APIs. Contributed by Mayank 
Bansal. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577052)
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationClientProtocol.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/applicationclient_protocol.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ApplicationClientProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/service/ApplicationClientProtocolPBServiceImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


 ApplicationClientProtocol and ApplicationHistoryProtocol should expose 
 analogous APIs
 -

 Key: YARN-1389
 URL: https://issues.apache.org/jira/browse/YARN-1389
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Fix For: 2.4.0

 Attachments: YARN-1389-1.patch, YARN-1389-2.patch, YARN-1389-3.patch, 
 YARN-1389-4.patch, YARN-1389-5.patch, YARN-1389-6.patch, YARN-1389-7.patch, 
 YARN-1389-8.patch, YARN-1389-9.patch, YARN-1389.10.patch, YARN-1389.11.patch


 As we plan to have the APIs in ApplicationHistoryProtocol to expose the 
 reports of *finished* application attempts and containers, we should do the 
 same for ApplicationClientProtocol, which will return the reports of 
 *running* attempts and containers.
 Later on, we can improve YarnClient to direct the query of running instance 
 to ApplicationClientProtocol, while that of finished instance to 
 ApplicationHistoryProtocol, making it transparent to the users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats

2014-03-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1512:


Attachment: YARN-1512.patch

Updated, ready I believe.

 Enhance CS to decouple scheduling from node heartbeats
 --

 Key: YARN-1512
 URL: https://issues.apache.org/jira/browse/YARN-1512
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: YARN-1512.patch, YARN-1512.patch, YARN-1512.patch


 Enhance CS to decouple scheduling from node heartbeats; a prototype has 
 improved latency significantly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933449#comment-13933449
 ] 

Hadoop QA commented on YARN-1512:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634446/YARN-1512.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3347//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3347//console

This message is automatically generated.

 Enhance CS to decouple scheduling from node heartbeats
 --

 Key: YARN-1512
 URL: https://issues.apache.org/jira/browse/YARN-1512
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: YARN-1512.patch, YARN-1512.patch, YARN-1512.patch


 Enhance CS to decouple scheduling from node heartbeats; a prototype has 
 improved latency significantly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-13 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933478#comment-13933478
 ] 

Jonathan Eagles commented on YARN-1769:
---

Hi, Tom. Can you comment on the findbugs warnings that are introduced as part 
of this patch when you get a chance?

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-03-13 Thread Naren Koneru (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933541#comment-13933541
 ] 

Naren Koneru commented on YARN-1577:


Hi Zhijie, Nice, thanks for letting me know. I will use that in llama and also 
submit a patch for yarn unmanagedamlauncher later today.

regards
Naren 

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Jian He
Assignee: Naren Koneru
Priority: Blocker

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1830) TestRMRestart.testQueueMetricsOnRMRestart failure

2014-03-13 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1830:
--

 Summary: TestRMRestart.testQueueMetricsOnRMRestart failure
 Key: YARN-1830
 URL: https://issues.apache.org/jira/browse/YARN-1830
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla


TestRMRestart.testQueueMetricsOnRMRestart fails intermittently as follows 
(reported on YARN-1815):

{noformat}
java.lang.AssertionError: expected:37 but was:38
...
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1728)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1682)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933580#comment-13933580
 ] 

Karthik Kambatla commented on YARN-1815:


The tests pass locally. Filed YARN-1830 for TestRMRestart failure and YARN-1591 
covers TestRMRestart failure.

[~vinodkv] - mind taking a look at the updated patch? 

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1591:


Assignee: Tsuyoshi OZAWA  (was: Vinod Kumar Vavilapalli)

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1830) TestRMRestart.testQueueMetricsOnRMRestart failure

2014-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933586#comment-13933586
 ] 

Zhijie Shen commented on YARN-1830:
---

See the same failure reported on YARN-1389

 TestRMRestart.testQueueMetricsOnRMRestart failure
 -

 Key: YARN-1830
 URL: https://issues.apache.org/jira/browse/YARN-1830
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla

 TestRMRestart.testQueueMetricsOnRMRestart fails intermittently as follows 
 (reported on YARN-1815):
 {noformat}
 java.lang.AssertionError: expected:37 but was:38
 ...
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.assertQueueMetrics(TestRMRestart.java:1728)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart(TestRMRestart.java:1682)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1512) Enhance CS to decouple scheduling from node heartbeats

2014-03-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933582#comment-13933582
 ] 

Arun C Murthy commented on YARN-1512:
-

Looks like YARN-1591 tracks the failure with TestResourceTrackerService.

 Enhance CS to decouple scheduling from node heartbeats
 --

 Key: YARN-1512
 URL: https://issues.apache.org/jira/browse/YARN-1512
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: YARN-1512.patch, YARN-1512.patch, YARN-1512.patch


 Enhance CS to decouple scheduling from node heartbeats; a prototype has 
 improved latency significantly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens

2014-03-13 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-1795:


Description: 
Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
of the tests to be flakey.  Doing some digging, I found that they were failing 
because some of the MR jobs were failing; I found this in the syslog of the 
failed jobs:
{noformat}
2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1394064846476_0013_m_00_0: Container launch failed for 
container_1394064846476_0013_01_03 : 
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for 192.168.1.77:50759
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
{noformat}

I did some debugging and found that the NMTokenCache has a different port 
number than what's being looked up.  For example, the NMTokenCache had one 
token with address 192.168.1.77:58217 but 
ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
Container is being launched it somehow has a different port than when the token 
was created.

Any ideas why the port numbers wouldn't match?

Update: This also happens in an actual cluster, not just Oozie's unit tests

  was:
Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
of the tests to be flakey.  Doing some digging, I found that they were failing 
because some of the MR jobs were failing; I found this in the syslog of the 
failed jobs:
{noformat}
2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1394064846476_0013_m_00_0: Container launch failed for 
container_1394064846476_0013_01_03 : 
org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
for 192.168.1.77:50759
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
   at 
org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
   at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
{noformat}

I did some debugging and found that the NMTokenCache has a different port 
number than what's being looked up.  For example, the NMTokenCache had one 
token with address 192.168.1.77:58217 but 
ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
Container is being launched it somehow has a different port than when the token 
was created.

Any ideas why the port numbers wouldn't match?

Summary: After YARN-713, using FairScheduler can cause an InvalidToken 
Exception for NMTokens  (was: Oozie tests are flakey after YARN-713)

We've now seen this problem in an actual cluster, not just Oozie's unit tests; 
so this is definitely a problem and not something funny we're 

[jira] [Updated] (YARN-1795) After YARN-713, using FairScheduler can cause an InvalidToken Exception for NMTokens

2014-03-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1795:
---

Priority: Blocker  (was: Critical)

 After YARN-713, using FairScheduler can cause an InvalidToken Exception for 
 NMTokens
 

 Key: YARN-1795
 URL: https://issues.apache.org/jira/browse/YARN-1795
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Robert Kanter
Priority: Blocker
 Attachments: 
 org.apache.oozie.action.hadoop.TestMapReduceActionExecutor-output.txt, syslog


 Running the Oozie unit tests against a Hadoop build with YARN-713 causes many 
 of the tests to be flakey.  Doing some digging, I found that they were 
 failing because some of the MR jobs were failing; I found this in the syslog 
 of the failed jobs:
 {noformat}
 2014-03-05 16:18:23,452 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
 report from attempt_1394064846476_0013_m_00_0: Container launch failed 
 for container_1394064846476_0013_01_03 : 
 org.apache.hadoop.security.token.SecretManager$InvalidToken: No NMToken sent 
 for 192.168.1.77:50759
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.newProxy(ContainerManagementProtocolProxy.java:206)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy$ContainerManagementProtocolProxyData.init(ContainerManagementProtocolProxy.java:196)
at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.getProxy(ContainerManagementProtocolProxy.java:117)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.getCMProxy(ContainerLauncherImpl.java:403)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:138)
at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:369)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
 {noformat}
 I did some debugging and found that the NMTokenCache has a different port 
 number than what's being looked up.  For example, the NMTokenCache had one 
 token with address 192.168.1.77:58217 but 
 ContainerManagementProtocolProxy.java:119 is looking for 192.168.1.77:58213. 
 The 58213 address comes from ContainerLauncherImpl's constructor. So when the 
 Container is being launched it somehow has a different port than when the 
 token was created.
 Any ideas why the port numbers wouldn't match?
 Update: This also happens in an actual cluster, not just Oozie's unit tests



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933637#comment-13933637
 ] 

Hadoop QA commented on YARN-1811:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634241/YARN-1811.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3348//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3348//console

This message is automatically generated.

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933639#comment-13933639
 ] 

Jian He commented on YARN-1815:
---

Thanks Karthik for the patch.
For now, it should be fine to move UMA to Failed state as UMA is not saving the 
final state and RM restart doesn’t support UMA. The core change looks good.

Test case:  we need a more thorough test case to test UMA is moved to Failed 
state after RM restarts using two MockRMs like the ones in TestRMRestart. The 
bigger problem is that if Unmanged application is not added back to the 
completedApps in RMAppManager after RM restart via the FinalTransition, it'll 
never be removed from state store. We remove the applications from state store 
when completedApps in RMAppManager go beyond the max-app-limit.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933820#comment-13933820
 ] 

Tsuyoshi OZAWA commented on YARN-1591:
--

test failures looks not related. [~jianhe], can you take a look?

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933825#comment-13933825
 ] 

Robert Kanter commented on YARN-1811:
-

TestResourceTrackerService is flakey (and fails without the patch): YARN-1591

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-13 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933831#comment-13933831
 ] 

Thomas Graves commented on YARN-1769:
-

The findbugs can be ignored.   They are talking about inconsistent 
synchronization but of the class variable that is sometimes referenced inside 
synchronized functions and sometimes not.  It doesn't matter if that variable 
is accessed synchronized or not.  I'll add it to the excludes files.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-1831) Job should be marked as Falied if it is recovered from commit.

2014-03-13 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-1831:


 Summary: Job should be marked as Falied if it is recovered from 
commit.
 Key: YARN-1831
 URL: https://issues.apache.org/jira/browse/YARN-1831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


If Resource manager is restarted when a job is in commit state, The job is not 
able to recovered after RM restart and it is marked as Killed. 

The job status should be Failed instead killed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-13 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

exclude findbugs warnings.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933860#comment-13933860
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634508/YARN-1769.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3349//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1831) Job should be marked as Falied if it is recovered from commit.

2014-03-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reassigned YARN-1831:
---

Assignee: Xuan Gong

 Job should be marked as Falied if it is recovered from commit.
 --

 Key: YARN-1831
 URL: https://issues.apache.org/jira/browse/YARN-1831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Xuan Gong

 If Resource manager is restarted when a job is in commit state, The job is 
 not able to recovered after RM restart and it is marked as Killed. 
 The job status should be Failed instead killed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1831) Job should be marked as Falied if it is recovered from commit.

2014-03-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933869#comment-13933869
 ] 

Xuan Gong commented on YARN-1831:
-

Close this as duplicate

 Job should be marked as Falied if it is recovered from commit.
 --

 Key: YARN-1831
 URL: https://issues.apache.org/jira/browse/YARN-1831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Xuan Gong

 If Resource manager is restarted when a job is in commit state, The job is 
 not able to be recovered after RM restart and it is marked as Killed. 
 The job status should be Failed instead killed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1831) Job should be marked as Falied if it is recovered from commit.

2014-03-13 Thread Yesha Vora (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yesha Vora updated YARN-1831:
-

Description: 
If Resource manager is restarted when a job is in commit state, The job is not 
able to be recovered after RM restart and it is marked as Killed. 

The job status should be Failed instead killed. 

  was:
If Resource manager is restarted when a job is in commit state, The job is not 
able to recovered after RM restart and it is marked as Killed. 

The job status should be Failed instead killed. 


 Job should be marked as Falied if it is recovered from commit.
 --

 Key: YARN-1831
 URL: https://issues.apache.org/jira/browse/YARN-1831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Xuan Gong

 If Resource manager is restarted when a job is in commit state, The job is 
 not able to be recovered after RM restart and it is marked as Killed. 
 The job status should be Failed instead killed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933866#comment-13933866
 ] 

Karthik Kambatla commented on YARN-1811:


Thanks Robert.

Comments:
# WebAppUtils#getProxyHostsAndPortsForAmFilter is a little dense for my liking. 
We should probably add comments for the various ifs and fors :)
# Nit: Okay with not fixing it. I find using Joiner more readable. 
{code}
StringBuilder sb = new StringBuilder();
for (String proxy : proxies) {
  sb.append(proxy.split(:)[0]).append(AmIpFilter.PROXY_HOSTS_DELIMITER);
}
sb.setLength(sb.length() - 1);
{code}
# AmIpFilter has a couple of public fields we are removing. We can leave them 
there for compatibility sake (in theory) and may be deprecate them as well. If 
others involved think okay, we should probably just make AmIpFilter @Private. 
# AmIpFilter#findRedirectUrl - we could use a MapString (host:port), 
proxyUriBase, so we don't need the following for loop.
{code}
  for (String proxyUriBase : proxyUriBases) {
try {
  URL url = new URL(proxyUriBase);
  if (host.equals(url.getHost() + : + url.getPort())) {
addr = proxyUriBase;
break;
  }
} catch(MalformedURLException e) {
  // ignore
}
  }
{code}
# Also, we should at least log the MalformedURLException above and not add to 
the map. 

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1831) Job should be marked as Falied if it is recovered from commit.

2014-03-13 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933867#comment-13933867
 ] 

Xuan Gong commented on YARN-1831:
-

create a MapReduce ticket. Let us start from there: 
https://issues.apache.org/jira/browse/MAPREDUCE-5795

 Job should be marked as Falied if it is recovered from commit.
 --

 Key: YARN-1831
 URL: https://issues.apache.org/jira/browse/YARN-1831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Xuan Gong

 If Resource manager is restarted when a job is in commit state, The job is 
 not able to be recovered after RM restart and it is marked as Killed. 
 The job status should be Failed instead killed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1831) Job should be marked as Falied if it is recovered from commit.

2014-03-13 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved YARN-1831.
-

Resolution: Duplicate

 Job should be marked as Falied if it is recovered from commit.
 --

 Key: YARN-1831
 URL: https://issues.apache.org/jira/browse/YARN-1831
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Xuan Gong

 If Resource manager is restarted when a job is in commit state, The job is 
 not able to be recovered after RM restart and it is marked as Killed. 
 The job status should be Failed instead killed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933873#comment-13933873
 ] 

Robert Kanter commented on YARN-1811:
-

I'll make those changes and put up a new patch.  I think we should make 
AmIpFilter {{@Private}}; my understanding is that its meant only to be used 
internally by Yarn for the AM anyway.

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-13 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated YARN-1769:


Attachment: YARN-1769.patch

upmerge patch to latest.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1815) RM should recover only Managed AMs

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933938#comment-13933938
 ] 

Jian He commented on YARN-1815:
---

bq. it should be fine to move UMA to Failed state as UMA is not saving the 
final state
On a second thought, if the UMA just successfully finished, and it will also be 
moved to FAILD state after RM restart? this doesn't seem right.

 RM should recover only Managed AMs
 --

 Key: YARN-1815
 URL: https://issues.apache.org/jira/browse/YARN-1815
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Attachments: Unmanaged AM recovery.png, yarn-1815-1.patch, 
 yarn-1815-2.patch, yarn-1815-2.patch


 RM should not recover unmanaged AMs until YARN-1823 is fixed. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933943#comment-13933943
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634513/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3350//console

This message is automatically generated.

 CapacityScheduler:  Improve reservations
 

 Key: YARN-1769
 URL: https://issues.apache.org/jira/browse/YARN-1769
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Thomas Graves
 Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
 YARN-1769.patch


 Currently the CapacityScheduler uses reservations in order to handle requests 
 for large containers and the fact there might not currently be enough space 
 available on a single host.
 The current algorithm for reservations is to reserve as many containers as 
 currently required and then it will start to reserve more above that after a 
 certain number of re-reservations (currently biased against larger 
 containers).  Anytime it hits the limit of number reserved it stops looking 
 at any other nodes. This results in potentially missing nodes that have 
 enough space to fullfill the request.   
 The other place for improvement is currently reservations count against your 
 queue capacity.  If you have reservations you could hit the various limits 
 which would then stop you from looking further at that node.  
 The above 2 cases can cause an application requesting a larger container to 
 take a long time to gets it resources.  
 We could improve upon both of those by simply continuing to look at incoming 
 nodes to see if we could potentially swap out a reservation for an actual 
 allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934005#comment-13934005
 ] 

Jian He commented on YARN-1591:
---

In the scope of this jira, the reason TestResourceTrackerService is failing 
because testNodeRegistrationWithContainers and 
testNodeRegistrationWithContainers is not stopping RM, causing cluster metrics 
already exists Exception, so stopping those two RMs should be enough ? btw. 
there's already a global variable rm to record the RM and the RM is stopped in 
the tearDown(), we may use that.
 

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934027#comment-13934027
 ] 

Zhijie Shen commented on YARN-1809:
---

AHS web-UI is still not able to show tags due to YARN-1462

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-1811:


Attachment: YARN-1811.patch

New patch addresses Karthik's comments.

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-1822) Revisit AM link being broken for work preserving restart

2014-03-13 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter resolved YARN-1822.
-

Resolution: Invalid

YARN-1811 is being done differently, and this is no longer needed

 Revisit AM link being broken for work preserving restart
 

 Key: YARN-1822
 URL: https://issues.apache.org/jira/browse/YARN-1822
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Robert Kanter

 We should revisit the issue in YARN-1811 as it may require changes once we 
 have work-preserving restarts.  
 Currently, the AmIpFilter is given the active RM at AM 
 initialization/startup, so when the RM fails over and the AM is restarted, 
 this gets recalculated properly.  However, with work-preserving restart, this 
 will now point to the inactive RM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1591:
-

Attachment: YARN-1591.5.patch

[~jianhe] oops, I've overlooked rm is defined locally in the test cases. Than k 
you for pointing out. +1 on your idea. Updated to patch to use RM defined in 
class field. 

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-03-13 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-1717:
-

Attachment: YARN-1717.10.patch

[~zjshen], thanks for the review.  I have implemented your suggestions in the 
attached patch, with the following notes.

bq. 2. Should these aging mechanism related configs have a leveldb section in 
the config name? Because they're only related to the leveldb impl.
I moved ttl-interval-ms to the leveldb section, but kept ttl-ms and ttl-enable 
in the timeline store section since I think those could be useful for all 
stores.

bq. 5. It seems not necessary to refactor getEntity into two methods, doesn't 
it?
Thanks for pointing this out.  I was able to remove a number of changes that 
were only needed for the old deletion strategy.

bq. 7. In discardOldEntities, if one IOException happens, is it good to move on 
with the following discarding operations?
I added a catch for the exception, logged an error, and continued deletions for 
the next entity type.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, 
 YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, 
 YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-13 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1536:
---

Assignee: Anubhav Dhoot  (was: Karthik Kambatla)

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie

 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1809:
--

Attachment: YARN-1809.2.patch

Upload a new patch with the following changes:

1. Rebase against YARN-1389
2. Do more refactoring on App(s)/Attempt/Container page classes
3. Fix the bugs

I've done some local test for App(s)/Attempt/Container pages of RM webUI, which 
is so far so good, exception some error caused by log URL, which will be 
handled in YARN-1685.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934155#comment-13934155
 ] 

Hadoop QA commented on YARN-1811:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634541/YARN-1811.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3351//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3351//console

This message is automatically generated.

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not

2014-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934165#comment-13934165
 ] 

Zhijie Shen commented on YARN-808:
--

After YARN-1389, we have separate APIs to get the application attempt 
report(s), where we can get the application attempt state. IMHO, we no longer 
need to have additional attempt state in application report. Any idea?

 ApplicationReport does not clearly tell that the attempt is running or not
 --

 Key: YARN-808
 URL: https://issues.apache.org/jira/browse/YARN-808
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-808.1.patch


 When an app attempt fails and is being retried, ApplicationReport immediately 
 gives the new attemptId and non-null values of host etc. There is no way for 
 clients to know that the attempt is running other than connecting to it and 
 timing out on invalid host. Solution would be to expose the attempt state or 
 return a null value for host instead of N/A



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934169#comment-13934169
 ] 

Hadoop QA commented on YARN-1717:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634551/YARN-1717.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3353//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3353//console

This message is automatically generated.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, 
 YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, 
 YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934174#comment-13934174
 ] 

Arun C Murthy commented on YARN-796:


Back to this, some thoughts:

* Admin interface
** Labels are specified by admins (node configuration, dynamic add/remove via 
rmadmin).
** Each scheduler (CS, FS) can pick how they want labels specified in their 
configs
** Dynamically added labels are, initially, not persisted across RM restarts. 
So, these need to be manually edited into capacity-scheduler.xml etc.
** By default, all nodes have a *default* label, but admins can explicitly set 
a list of labels and drop the *default* label.
** Queues have label ACLs i.e. admins can specify, per queue, what labels can 
be used by applications per queue

* End-user interface
** Applications can ask for containers on nodes with specific labels as part of 
the RR; however, host-specific RRs with labels are illegal i.e. labels are 
allowed only for  rack  * RRs: results in InvalidResourceRequestException
** RR with a non-existent label (point in time) is illegal: results in 
InvalidResourceRequestException
** RR with label without appropriate ACL results in 
InvalidResourceRequestException (do we want a special 
InvalidResourceRequestACLException?)
** Initially, RRs can ask for multiple labels with the expectation that it's an 
AND operation




 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934176#comment-13934176
 ] 

Jian He commented on YARN-1591:
---

Thanks for the patch [~ozawa] ! 
Patch looks good, +1

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934174#comment-13934174
 ] 

Arun C Murthy edited comment on YARN-796 at 3/13/14 10:00 PM:
--

Back to this, some thoughts:

* Admin interface
** Labels are specified by admins (node configuration, dynamic add/remove via 
rmadmin).
** Each scheduler (CS, FS) can pick how they want labels specified in their 
configs
** Dynamically added labels are, initially, not persisted across RM restarts. 
So, these need to be manually edited into yarn-site.xml, ACLs into 
capacity-scheduler.xml etc.
** By default, all nodes have a *default* label, but admins can explicitly set 
a list of labels and drop the *default* label.
** Queues have label ACLs i.e. admins can specify, per queue, what labels can 
be used by applications per queue

* End-user interface
** Applications can ask for containers on nodes with specific labels as part of 
the RR; however, host-specific RRs with labels are illegal i.e. labels are 
allowed only for  rack  * RRs: results in InvalidResourceRequestException
** RR with a non-existent label (point in time) is illegal: results in 
InvalidResourceRequestException
** RR with label without appropriate ACL results in 
InvalidResourceRequestException (do we want a special 
InvalidResourceRequestACLException?)
** Initially, RRs can ask for multiple labels with the expectation that it's an 
AND operation





was (Author: acmurthy):
Back to this, some thoughts:

* Admin interface
** Labels are specified by admins (node configuration, dynamic add/remove via 
rmadmin).
** Each scheduler (CS, FS) can pick how they want labels specified in their 
configs
** Dynamically added labels are, initially, not persisted across RM restarts. 
So, these need to be manually edited into capacity-scheduler.xml etc.
** By default, all nodes have a *default* label, but admins can explicitly set 
a list of labels and drop the *default* label.
** Queues have label ACLs i.e. admins can specify, per queue, what labels can 
be used by applications per queue

* End-user interface
** Applications can ask for containers on nodes with specific labels as part of 
the RR; however, host-specific RRs with labels are illegal i.e. labels are 
allowed only for  rack  * RRs: results in InvalidResourceRequestException
** RR with a non-existent label (point in time) is illegal: results in 
InvalidResourceRequestException
** RR with label without appropriate ACL results in 
InvalidResourceRequestException (do we want a special 
InvalidResourceRequestACLException?)
** Initially, RRs can ask for multiple labels with the expectation that it's an 
AND operation




 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934182#comment-13934182
 ] 

Karthik Kambatla commented on YARN-1811:


Changes look good to me. I ll defer the @Private on AmIpFilter to Vinod.

[~vinodkv] - can you take a look at the latest patch from Robert? 

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1811) RM HA: AM link broken if the AM is on nodes other than RM

2014-03-13 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934181#comment-13934181
 ] 

Robert Kanter commented on YARN-1811:
-

Both failures look untreated and already have JIRAs: TestResourceTrackerService 
(YARN-1591) and TestRMRestart (YARN-1830)

 RM HA: AM link broken if the AM is on nodes other than RM
 -

 Key: YARN-1811
 URL: https://issues.apache.org/jira/browse/YARN-1811
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-1811.patch, YARN-1811.patch, YARN-1811.patch, 
 YARN-1811.patch


 When using RM HA, if you click on the Application Master link in the RM web 
 UI while the job is running, you get an Error 500:



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934189#comment-13934189
 ] 

Sandy Ryza commented on YARN-796:
-

Makes a lot of sense to me.  One nit:

bq. Each scheduler (CS, FS) can pick how they want labels specified in their 
configs
Correct me if I'm misunderstanding what you mean here, but currently neither 
scheduler has node-specific stuff in its configuration.  Updating the scheduler 
config when a node is added or removed from the cluster seems cumbersome.  
Should labels not be included in the NodeManager configuration like Resources 
are?

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934200#comment-13934200
 ] 

Hadoop QA commented on YARN-1591:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634550/YARN-1591.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3352//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3352//console

This message is automatically generated.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-13 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1536:


Attachment: yarn-1536.patch

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934246#comment-13934246
 ] 

Arun C Murthy commented on YARN-796:


[~sandyr] - Sorry, if it wasn't clear. I meant the ACLs for labels should be 
specified in each scheduler. 

So, for e.g.:

{noformat}
  property
nameyarn.scheduler.capacity.root.A.labels/name
valuelabelA, labelX/value
  /property

  property
nameyarn.scheduler.capacity.root.B.labels/name
valuelabelB, labelY/value
  /property
{noformat}

Makes sense?

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934247#comment-13934247
 ] 

Hadoop QA commented on YARN-1809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634560/YARN-1809.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3354//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3354//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3354//console

This message is automatically generated.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934245#comment-13934245
 ] 

Tsuyoshi OZAWA commented on YARN-1591:
--

[~jianhe], what do you think about the timeout? I've never found the timeout 
locally. Should timeout value of testGetNextHeartBeatInterval be larger?

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1809:
--

Attachment: YARN-1809.3.patch

Fix the findbugs

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1717) Enable offline deletion of entries in leveldb timeline store

2014-03-13 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934288#comment-13934288
 ] 

Zhijie Shen commented on YARN-1717:
---

Some minor things on the patch:
1. Rename the class to EntityDiscardThread or something?
{code}
+  private class DeletionThread extends Thread {
{code}

2. Have a warn level log here?
{code}
+  } catch (InterruptedException ignored) {
+  }
{code}

Another arguable issue is: it is possible that the entity is expired according 
to its TS, while part of its events is still in TTL. We do deletion according 
to entity's TS and at the entity's granularity, thus, the events that are still 
alive are likely to be deleted as well.

 Enable offline deletion of entries in leveldb timeline store
 

 Key: YARN-1717
 URL: https://issues.apache.org/jira/browse/YARN-1717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: YARN-1717.1.patch, YARN-1717.10.patch, 
 YARN-1717.2.patch, YARN-1717.3.patch, YARN-1717.4.patch, YARN-1717.5.patch, 
 YARN-1717.6-extra.patch, YARN-1717.6.patch, YARN-1717.7.patch, 
 YARN-1717.8.patch, YARN-1717.9.patch


 The leveldb timeline store implementation needs the following:
 * better documentation of its internal structures
 * internal changes to enable deleting entities
 ** never overwrite existing primary filter entries
 ** add hidden reverse pointers to related entities



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934293#comment-13934293
 ] 

Alejandro Abdelnur commented on YARN-796:
-

Arun, doing a recap on the config, is this what you mean?

ResourceManager {{yarn-site.xml}} would specify the valid labels systemwide 
(you didn't suggest this, but it prevent label typos going unnoticed):

{code}
property
  nameyarn.resourcemanager.valid-labels/name
  valuelabelA, labelB, labelX/value
/properties
{code}

NodeManagers yarn-site.xml would specify the labels of the node:

{code}
property
  nameyarn.nodemanager.labels/name
  valuelabelA, labelX/value
/properties
{code}

Scheduler configuration, in its queue configuration would specify what labels 
can be used when requesting allocations in that queue:

{code}
property
  nameyarn.scheduler.capacity.root.A.allowed-labels/name
  valuelabelA/value
/properties
{code}


 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934304#comment-13934304
 ] 

Jian He commented on YARN-1591:
---

Seems one more issue here... somehow the TestResourceTrackerService test suit 
crashes randomly.  I think you found a good clue earlier 
bq. I found a test failure by an uncaught exception after running lots tests. 


 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster

2014-03-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934330#comment-13934330
 ] 

Vinod Kumar Vavilapalli commented on YARN-1824:
---

Also call DEFAULT_YARN_APPLICATION_CLASSPATH_CROSS_ENV to be 
DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH ?

 Make Windows client work with Linux/Unix cluster
 

 Key: YARN-1824
 URL: https://issues.apache.org/jira/browse/YARN-1824
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-1824.1.patch, YARN-1824.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-13 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-1771:


Issue Type: Improvement  (was: Bug)

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-03-13 Thread Cindy Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1658:
---

Attachment: YARN1658.2.patch

Thanks Vinod for the comment. I've changed it to reduce overriding to only to 
get the filter class. Upload the latest patch. Tested on a secure cluster.

 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-03-13 Thread Cindy Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1658:
---

Attachment: YARN1658.3.patch

 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1536) Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the RMContext methods instead

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934363#comment-13934363
 ] 

Hadoop QA commented on YARN-1536:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634571/yarn-1536.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3355//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3355//console

This message is automatically generated.

 Cleanup: Get rid of ResourceManager#get*SecretManager() methods and use the 
 RMContext methods instead
 -

 Key: YARN-1536
 URL: https://issues.apache.org/jira/browse/YARN-1536
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.3.0
Reporter: Karthik Kambatla
Assignee: Anubhav Dhoot
Priority: Minor
  Labels: newbie
 Attachments: yarn-1536.patch


 Both ResourceManager and RMContext have methods to access the secret 
 managers, and it should be safe (cleaner) to get rid of the ResourceManager 
 methods.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1809) Synchronize RM and Generic History Service Web-UIs

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934368#comment-13934368
 ] 

Hadoop QA commented on YARN-1809:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634576/YARN-1809.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3356//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3356//console

This message is automatically generated.

 Synchronize RM and Generic History Service Web-UIs
 --

 Key: YARN-1809
 URL: https://issues.apache.org/jira/browse/YARN-1809
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1809.1.patch, YARN-1809.2.patch, YARN-1809.3.patch


 After YARN-953, the web-UI of generic history service is provide more 
 information than that of RM, the details about app attempt and container. 
 It's good to provide similar web-UIs, but retrieve the data from separate 
 source, i.e., RM cache and history store respectively.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1591:
-

Attachment: YARN-1591.6.patch

This patch works well on local.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934396#comment-13934396
 ] 

Hudson commented on YARN-1771:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5325 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5325/])
YARN-1771. Reduce the number of NameNode operations during localization of
public resources using a cache. Contributed by Sangjin Lee (cdouglas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577391)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapred/TestLocalDistributedCacheManager.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/FSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestFSDownload.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizerContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java


 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource

2014-03-13 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934426#comment-13934426
 ] 

Sangjin Lee commented on YARN-1771:
---

Thanks Chris! It would be great if you could commit this to branch-2.4 too...

 many getFileStatus calls made from node manager for localizing a public 
 distributed cache resource
 --

 Key: YARN-1771
 URL: https://issues.apache.org/jira/browse/YARN-1771
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Critical
 Fix For: 3.0.0, 2.4.0

 Attachments: yarn-1771.patch, yarn-1771.patch, yarn-1771.patch, 
 yarn-1771.patch


 We're observing that the getFileStatus calls are putting a fair amount of 
 load on the name node as part of checking the public-ness for localizing a 
 resource that belong in the public cache.
 We see 7 getFileStatus calls made for each of these resource. We should look 
 into reducing the number of calls to the name node. One example:
 {noformat}
 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724 ...
 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo   src=/tmp ...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   src=/...
 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo   
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 2014-02-27 18:07:27,355 INFO audit: ... cmd=open  
 src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1393#comment-1393
 ] 

Hadoop QA commented on YARN-1591:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634600/YARN-1591.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3357//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3357//console

This message is automatically generated.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-03-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934447#comment-13934447
 ] 

Hadoop QA commented on YARN-1658:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12634598/YARN1658.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3358//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3358//console

This message is automatically generated.

 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1591) TestResourceTrackerService fails randomly on trunk

2014-03-13 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934482#comment-13934482
 ] 

Tsuyoshi OZAWA commented on YARN-1591:
--

 Fixed not to throw YarnRuntimeException when InterruptedException is thrown 
 in EventDispatcher#handle. IIUC, throwing YarnRuntimeException in 
 EventDispatcher#handle is not handled in AsyncDispatcher and leads needless 
 crash. We should exit the thread gracefully in that case.

latest patch include this fix.

 TestResourceTrackerService fails randomly on trunk
 --

 Key: YARN-1591
 URL: https://issues.apache.org/jira/browse/YARN-1591
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1591.1.patch, YARN-1591.2.patch, YARN-1591.3.patch, 
 YARN-1591.3.patch, YARN-1591.5.patch, YARN-1591.6.patch


 As evidenced by Jenkins at 
 https://issues.apache.org/jira/browse/YARN-1041?focusedCommentId=13868621page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13868621.
 It's failing randomly on trunk on my local box too 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-03-13 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934487#comment-13934487
 ] 

Vinod Kumar Vavilapalli commented on YARN-1658:
---

Both failures are existing issues: TestResourceTrackerService tracked at 
YARN-1591 and TestRMRestart at YARN-1830.

The latest patch looks good to me. +1. Checking this in.

 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1658) Webservice should redirect to active RM when HA is enabled.

2014-03-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934519#comment-13934519
 ] 

Hudson commented on YARN-1658:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5326 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5326/])
YARN-1658. Modified web-app framework to let standby RMs redirect web-service 
calls to the active RM. Contributed by Cindy Li. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577408)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Dispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/Router.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/WebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMDispatcher.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebAppFilter.java


 Webservice should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1658
 URL: https://issues.apache.org/jira/browse/YARN-1658
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Cindy Li
Assignee: Cindy Li
  Labels: YARN
 Fix For: 2.4.0

 Attachments: YARN1658.1.patch, YARN1658.2.patch, YARN1658.3.patch, 
 YARN1658.patch


 When HA is enabled, web service to standby RM should be redirected to the 
 active RM. This is a related Jira to YARN-1525.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-03-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934522#comment-13934522
 ] 

Junping Du commented on YARN-796:
-

bq. ResourceManager yarn-site.xml would specify the valid labels systemwide 
(you didn't suggest this, but it prevent label typos going unnoticed):
I don't think typo of label is a big issue. Restricting labels in RM side 
potentially prevent to add new label for new application on new registering 
nodes as we don't have things to refresh yarn-site config dynamically. Isn't it?

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >