[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-10 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057170#comment-14057170
 ] 

Mayank Bansal commented on YARN-2069:
-

I just verified, rebased the patch and compiled and tested . Patch doesn't 
seems to be the problem.

Thanks,
Mayank

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057171#comment-14057171
 ] 

Hadoop QA commented on YARN-2088:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646030/YARN-2088.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.yarn.util.TestFSDownload

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4253//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4253//console

This message is automatically generated.

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2181:
-

Attachment: YARN-2181.patch

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, application page-1.png, application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits

2014-07-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057175#comment-14057175
 ] 

Wangda Tan commented on YARN-2069:
--

Hi Mayank,
Can you re-kick Jenkins manually?

Thanks,
Wangda

 CS queue level preemption should respect user-limits
 

 Key: YARN-2069
 URL: https://issues.apache.org/jira/browse/YARN-2069
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, 
 YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch


 This is different from (even if related to, and likely share code with) 
 YARN-2113.
 YARN-2113 focuses on making sure that even if queue has its guaranteed 
 capacity, it's individual users are treated in-line with their limits 
 irrespective of when they join in.
 This JIRA is about respecting user-limits while preempting containers to 
 balance queue capacities.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057200#comment-14057200
 ] 

Mayank Bansal commented on YARN-1408:
-

Thanks [~sunilg] for the patch.

Patch looks good , There are some minor comments
1. You current patch is not applying on the trunk, Please rebase on trunk.

2. There are lot of unwanted formatting changes, can you please revert them 
back. Some examples are as follows
{code}
-  .currentTimeMillis());
+.currentTimeMillis());
{code}

{code}
-RMContainer rmContainer =
-new RMContainerImpl(container, attemptId, node.getNodeID(),
-  applications.get(attemptId.getApplicationId()).getUser(), rmContext,
-  status.getCreationTime());
+RMContainer rmContainer = new RMContainerImpl(container, attemptId,
+node.getNodeID(), applications.get(attemptId.getApplicationId())
+.getUser(), rmContext, status.getCreationTime());
{code}
Please check this in all the patch.



 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057205#comment-14057205
 ] 

Hadoop QA commented on YARN-2208:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654946/YARN-2208.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4254//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4254//console

This message is automatically generated.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057217#comment-14057217
 ] 

Hadoop QA commented on YARN-2181:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654947/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4255//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4255//console

This message is automatically generated.

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, application page-1.png, application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057219#comment-14057219
 ] 

Xuan Gong commented on YARN-2208:
-

bq. org.apache.hadoop.yarn.util.TestFSDownload

This is unrelated test case failure

bq. 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

can pass successfully locally..

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-07-10 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057302#comment-14057302
 ] 

Binglin Chang commented on YARN-2088:
-

Hi [~jianhe] or [~djp], looks like there are no more comments? Would you help 
get this committed? Thanks.

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches

2014-07-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057332#comment-14057332
 ] 

Wangda Tan commented on YARN-2258:
--

Hi [~nishan],
According to your log provided, I think it's a duplicate of YARN-1885. 
YARN-1885 is targeting to be released at 2.5.0 (coming soon).
I'll close it. Please reopen it if you encounter such problems after you 
upgraded to 2.5.0

Thanks,
Wangda

 Aggregation of MR job logs failing when Resourcemanager switches
 

 Key: YARN-2258
 URL: https://issues.apache.org/jira/browse/YARN-2258
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty

 1.Install RM in HA mode
 2.Run a job with more tasks
 3.Induce RM switchover while job is in progress
 Observe that log aggregation fails for the job which is running when  
 Resourcemanager switchover is induced.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches

2014-07-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan resolved YARN-2258.
--

Resolution: Duplicate
  Assignee: Wangda Tan

 Aggregation of MR job logs failing when Resourcemanager switches
 

 Key: YARN-2258
 URL: https://issues.apache.org/jira/browse/YARN-2258
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Wangda Tan

 1.Install RM in HA mode
 2.Run a job with more tasks
 3.Induce RM switchover while job is in progress
 Observe that log aggregation fails for the job which is running when  
 Resourcemanager switchover is induced.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2257) Add user to queue mappings to automatically place users' apps into specific queues

2014-07-10 Thread Patrick Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057339#comment-14057339
 ] 

Patrick Liu commented on YARN-2257:
---

Hi, Vinod,
I think we could inject the user-queue mapping judgement in 'RMAppManager's 
method 'protected synchronized void submitApplication':
  // Sanity checks
  if (submissionContext.getQueue() == null) {
submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
  }
  if (submissionContext.getApplicationName() == null) {
submissionContext.setApplicationName(
YarnConfiguration.DEFAULT_APPLICATION_NAME);
  }
All applications submitted to yarn will be launched by 'RMAppManager'.
'RMAppManager' will do sanity check, create a 'RMAppImpl' instance, and finally 
send the 'new RMAppEvent(applicationId, RMAppEventType.START)' event.
When 'RMAppImpl' received the Event, it will change the state machine and do 
the transition. The transition will launch the 'RMAppAttemptImpl', and start 
the 'RMAppAttemptImpl'.
Then the app will be scheduled by the specific scheduler. 
The only thing we need to injuect is the QUEUE in the submissionContext.
Like this:
// Precondition: set user-as-default-queue to false in yarn-site.xml
if(QueuePlacementRule.hasMappingForUser(user)) {
submissionContext.setQueue(QueuePlacementRule.getQueue(user));
} else {
submissionContext.setQueue(Default);
}

 Add user to queue mappings to automatically place users' apps into specific 
 queues
 --

 Key: YARN-2257
 URL: https://issues.apache.org/jira/browse/YARN-2257
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Reporter: Patrick Liu
Assignee: Vinod Kumar Vavilapalli
  Labels: features

 Currently, the fair-scheduler supports two modes, default queue or individual 
 queue for each user.
 Apparently, the default queue is not a good option, because the resources 
 cannot be managed for each user or group.
 However, individual queue for each user is not good enough. Especially when 
 connecting yarn with hive. There will be increasing hive users in a corporate 
 environment. If we create a queue for a user, the resource management will be 
 hard to maintain.
 I think the problem can be solved like this:
 1. Define user-queue mapping in Fair-Scheduler.xml. Inside each queue, use 
 aclSubmitApps to control user's ability.
 2. Each time a user submit an app to yarn, if the user has mapped to a queue, 
 the app will be scheduled to that queue; otherwise, the app will be submitted 
 to default queue.
 3. If the user cannot pass aclSubmitApps limits, the app will not be accepted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057352#comment-14057352
 ] 

Hudson commented on YARN-2131:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #609 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/609/])
YARN-2131. Add a way to format the RMStateStore. (Robert Kanter via kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609278)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057354#comment-14057354
 ] 

Hudson commented on YARN-1366:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #609 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/609/])
YARN-1366. Changed AMRMClient to re-register with RM and send outstanding 
requests back to RM on work-preserving RM restart. Contributed by Rohith 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609254)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml


 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Fix For: 2.5.0

 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.12.patch, YARN-1366.13.patch, 
 YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, 
 YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.9.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2267) Auxiliary Service support in RM

2014-07-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057415#comment-14057415
 ] 

Naganarasimha G R commented on YARN-2267:
-


Some scenarios for supporting Auxillary Service in RM:
Scenario 1: [Overload Control type]
a. Monitor plugin inside RM can open rpc port and can recieve feedback 
from other components in cluster (NM, HBase etc)
b. Based on feedback, monitor plugin can take action such as remove a 
particular NM or change capacity of an NM etc. 

Scenario 2: [Alarming module]
a. Any state changes such as RM moved to Standby/Active, NM added , NM 
removed/decommisioned etc can easily be informed/reported to central monitoring 
service. [instead of existing pull type thru REST api's ]
b. This plugin should also be able to register to RM for critical state 
changes as mentioned above, so that these can be reported.



 Auxiliary Service support in RM
 ---

 Key: YARN-2267
 URL: https://issues.apache.org/jira/browse/YARN-2267
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Naganarasimha G R

 Currently RM does not have a provision to run any Auxiliary services. For 
 health/monitoring in RM, its better to make a plugin mechanism in RM itself, 
 similar to NM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2272) UI issues in timeline server

2014-07-10 Thread Nishan Shetty (JIRA)
Nishan Shetty created YARN-2272:
---

 Summary: UI issues in timeline server
 Key: YARN-2272
 URL: https://issues.apache.org/jira/browse/YARN-2272
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty


Links to nodemanager is not working in timeline server



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2272) UI issues in timeline server

2014-07-10 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty updated YARN-2272:


Priority: Minor  (was: Major)

 UI issues in timeline server
 

 Key: YARN-2272
 URL: https://issues.apache.org/jira/browse/YARN-2272
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor

 Links to nodemanager is not working in timeline server



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2272) UI issues in timeline server

2014-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057440#comment-14057440
 ] 

Zhijie Shen commented on YARN-2272:
---

[~nishan], thanks for reporting the issue. It has been documented before: 
YARN-1884. Please refer to this jira for why the link doesn't work. I agree 
with it, we can close the current jira as duplicate of YARN-1884.

 UI issues in timeline server
 

 Key: YARN-2272
 URL: https://issues.apache.org/jira/browse/YARN-2272
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor

 Links to nodemanager is not working in timeline server



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2272) UI issues in timeline server

2014-07-10 Thread Nishan Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057454#comment-14057454
 ] 

Nishan Shetty commented on YARN-2272:
-

Thanks [~zjshen] for looking into the issue
I will close this as duplicate of YARN-1884

 UI issues in timeline server
 

 Key: YARN-2272
 URL: https://issues.apache.org/jira/browse/YARN-2272
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor

 Links to nodemanager is not working in timeline server



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2272) UI issues in timeline server

2014-07-10 Thread Nishan Shetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishan Shetty resolved YARN-2272.
-

Resolution: Duplicate

 UI issues in timeline server
 

 Key: YARN-2272
 URL: https://issues.apache.org/jira/browse/YARN-2272
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor

 Links to nodemanager is not working in timeline server



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2258) Aggregation of MR job logs failing when Resourcemanager switches

2014-07-10 Thread Nishan Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057460#comment-14057460
 ] 

Nishan Shetty commented on YARN-2258:
-

Thanks [~vinodkv] and [~leftnoteasy] for looking into the issue

 Aggregation of MR job logs failing when Resourcemanager switches
 

 Key: YARN-2258
 URL: https://issues.apache.org/jira/browse/YARN-2258
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Wangda Tan

 1.Install RM in HA mode
 2.Run a job with more tasks
 3.Induce RM switchover while job is in progress
 Observe that log aggregation fails for the job which is running when  
 Resourcemanager switchover is induced.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057472#comment-14057472
 ] 

Hudson commented on YARN-1366:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1800 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1800/])
YARN-1366. Changed AMRMClient to re-register with RM and send outstanding 
requests back to RM on work-preserving RM restart. Contributed by Rohith 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609254)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml


 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Fix For: 2.5.0

 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.12.patch, YARN-1366.13.patch, 
 YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, 
 YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.9.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-07-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2228:
--

Attachment: YARN-2228.3.patch

 TimelineServer should load pseudo authentication filter when authentication = 
 simple
 

 Key: YARN-2228
 URL: https://issues.apache.org/jira/browse/YARN-2228
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch


 When kerberos authentication is not enabled, we should let the timeline 
 server to work with pseudo authentication filter. In this way, the sever is 
 able to detect the request user by checking user.name.
 On the other hand, timeline client should append user.name in un-secure 
 case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057504#comment-14057504
 ] 

Zhijie Shen commented on YARN-2228:
---

Vinod, thanks for your review. Please check my response bellow.

bq. Not sure if we can rename this to be better, but if possible we should.

yarn.timeline-service is trying to indicate which component the 
configurations are related to, and http.authentication. is to be as close 
as to the original hadoop.http.authentication.. Does it make sense?

bq. After this patch, owner should never be empty, right? We can reject 
requests when we cannot figure out the submission user.

Via TimelineClient, the owner is always set no matter it is pseudo or kerberos 
authentication. However, users can choose to walk around TimelineClient and 
post entities to the timeline server on top of the REST API directly. 
Personally, I prefer to accept anonymous user, in case some users want to 
ignore security at all. For example, when testing the functionality stuff, 
users may not want to append user.name= every time they compose a URL.

bq. I am not able find the magic that is automatically putting the 
PseudoAuthFilter into the configuration. It also seems like 
TimelineAuthenticationFilterInitializer is always added irrespective of 
security.

It is based on the agreement that ACLs need to work in insecure mode (i.e. type 
= simple) as well. Given this agreement, I need always to use 
TimelineAuthenticationFilterInitializer to load TimelineAuthenticationFilter, 
which will extract the user information from the request. When type = simple, 
the user information comes from the URL param. On the other hand, if we don't 
load the authentication filter in insecure mode, the timeline server is unable 
to know the user of a request.

By default, the authentication type is simple, the parent class of 
TimelineAuthenticationFilter (i.e., AuthenticationFilter) is going to load 
PseudoAuthenticationFilter. The magic is within AuthenticationFilter#init.

bq. It doesn't seem like we had tests to validate delegationtoken based access 
to TimelineServer?

The whole authentication part is lacking test cases. Given the work of 
HADOOP-10799, we may take advantage of DT authentication stack in common, which 
will mitigate the problem, because the relevant test cases are promoted to 
common together with DT authentication stack. After that, we can evaluate what 
are missing UTs for the scenario of the timeline server. Now let's just file a 
ticket to track the UT stuff. How do you think?

Please suggest on these points further. For the remaining comments, I've 
addressed them in the newly uploaded patch.



 TimelineServer should load pseudo authentication filter when authentication = 
 simple
 

 Key: YARN-2228
 URL: https://issues.apache.org/jira/browse/YARN-2228
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch


 When kerberos authentication is not enabled, we should let the timeline 
 server to work with pseudo authentication filter. In this way, the sever is 
 able to detect the request user by checking user.name.
 On the other hand, timeline client should append user.name in un-secure 
 case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057539#comment-14057539
 ] 

Hudson commented on YARN-1366:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1827 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1827/])
YARN-1366. Changed AMRMClient to re-register with RM and send outstanding 
requests back to RM on work-preserving RM restart. Contributed by Rohith 
(jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609254)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/AMRMClient.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClientOnRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/core-site.xml


 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Fix For: 2.5.0

 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.12.patch, YARN-1366.13.patch, 
 YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, 
 YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.9.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057537#comment-14057537
 ] 

Hudson commented on YARN-2131:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1827 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1827/])
YARN-2131. Add a way to format the RMStateStore. (Robert Kanter via kasha) 
(kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609278)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/bin/yarn
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm


 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057545#comment-14057545
 ] 

Hadoop QA commented on YARN-2228:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654990/YARN-2228.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice:

  org.apache.hadoop.yarn.util.TestFSDownload
  
org.apache.hadoop.yarn.server.applicationhistoryservice.TestMemoryApplicationHistoryStore

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4256//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4256//console

This message is automatically generated.

 TimelineServer should load pseudo authentication filter when authentication = 
 simple
 

 Key: YARN-2228
 URL: https://issues.apache.org/jira/browse/YARN-2228
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch


 When kerberos authentication is not enabled, we should let the timeline 
 server to work with pseudo authentication filter. In this way, the sever is 
 able to detect the request user by checking user.name.
 On the other hand, timeline client should append user.name in un-secure 
 case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-10 Thread Milan Potocnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milan Potocnik updated YARN-1994:
-

Attachment: YARN-1994.4.patch

Hi guys,

I have attached a slightly updated version of the patch which incorporates your 
changes to test and configuration. The only difference should be that:
 - Adding logic for Timeline service
 - Putting all JHS related bind options under MR_HISTORY_BIND_HOST, instead of 
having 4 options.
 - Did some minor code cleanup

Thanks for reviewing and pushing this!


 Expose YARN/MR endpoints on multiple interfaces
 ---

 Key: YARN-1994
 URL: https://issues.apache.org/jira/browse/YARN-1994
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Arpit Agarwal
Assignee: Craig Welch
 Attachments: YARN-1994.0.patch, YARN-1994.1.patch, YARN-1994.2.patch, 
 YARN-1994.3.patch, YARN-1994.4.patch


 YARN and MapReduce daemons currently do not support specifying a wildcard 
 address for the server endpoints. This prevents the endpoints from being 
 accessible from all interfaces on a multihomed machine.
 Note that if we do specify INADDR_ANY for any of the options, it will break 
 clients as they will attempt to connect to 0.0.0.0. We need a solution that 
 allows specifying a hostname or IP-address for clients while requesting 
 wildcard bind for the servers.
 (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-07-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2228:
--

Attachment: YARN-2228.4.patch

Relax the criterium for TestMemoryApplicationHistoryStore, while the other test 
failure seems to be transit and not related.

 TimelineServer should load pseudo authentication filter when authentication = 
 simple
 

 Key: YARN-2228
 URL: https://issues.apache.org/jira/browse/YARN-2228
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2228.1.patch, YARN-2228.2.patch, YARN-2228.3.patch, 
 YARN-2228.4.patch


 When kerberos authentication is not enabled, we should let the timeline 
 server to work with pseudo authentication filter. In this way, the sever is 
 able to detect the request user by checking user.name.
 On the other hand, timeline client should append user.name in un-secure 
 case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.8.patch

Thank you [~mayank_bansal] for the review. I have updated patch against trunk 
and fixed formatting problems. Kindly check.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057605#comment-14057605
 ] 

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655007/Yarn-1408.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4259//console

This message is automatically generated.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: (was: Yarn-1408.8.patch)

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.8.patch

Reattaching patch again as there was a test case problem. 

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2181:
--

Attachment: YARN-2181.patch

patch looks good overall, removed unused RMAppAttempt#isPreempted method. and 
did few code refactor in RMContainerImpl#updatePreemptionMetrics.

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057764#comment-14057764
 ] 

Hadoop QA commented on YARN-1408:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655031/Yarn-1408.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4260//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4260//console

This message is automatically generated.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
 Yarn-1408.8.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057787#comment-14057787
 ] 

Jian He commented on YARN-2208:
---

looks good overall, 

1. Maybe change passwords to use concurrentHashMap and use read/write lock 
guard nextMasterKey/currentMasterKey for better concurrency as this is a chatty 
class.
2. Put In the same line
{code}
+ ms and AMRMTokenKeyActivationDelay:  + this.activationDelay
+ ms);
  }
else if (identifier.getKeyId() == this.currentMasterKey.getMasterKey()
{code}
3. Info level for easier debugging, while stabilizing this feature.
 {code}
if (LOG.isDebugEnabled()) {
  LOG.debug(Activating next master key with id: 
  + this.nextMasterKey.getMasterKey().getKeyId());
}
{code}
4. createAndGetAMRMToken, add info log here also.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2273) Flapping node caused NPE in FairScheduler

2014-07-10 Thread Andy Skelton (JIRA)
Andy Skelton created YARN-2273:
--

 Summary: Flapping node caused NPE in FairScheduler
 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton


One DN experienced memory errors and entered a cycle of rebooting and rejoining 
the cluster. After the second time the node went away, the RM produced this:
{code}
2014-07-09 21:47:36,571 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application attempt appattempt_1404858438119_4352_01 released container 
container_1404858438119_4352_01_04 on node: host: 
node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
2014-07-09 21:47:36,571 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
memory:335872, vCores:328
2014-07-09 21:47:36,571 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[ContinuousScheduling,5,main] threw an Exception.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
at java.lang.Thread.run(Thread.java:744)
{code}

A few minutes later YARN was crippled. The RM was running and jobs could be 
submitted but containers were not assigned and no progress was made. Restarting 
the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057816#comment-14057816
 ] 

Hadoop QA commented on YARN-2181:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655043/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4261//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4261//console

This message is automatically generated.

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057840#comment-14057840
 ] 

Jian He commented on YARN-2181:
---

committing this

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap

2014-07-10 Thread Andy Skelton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Skelton updated YARN-2273:
---

Description: 
One DN experienced memory errors and entered a cycle of rebooting and rejoining 
the cluster. After the second time the node went away, the RM produced this:
{code}
2014-07-09 21:47:36,571 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application attempt appattempt_1404858438119_4352_01 released container 
container_1404858438119_4352_01_04 on node: host: 
node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
2014-07-09 21:47:36,571 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
memory:335872, vCores:328
2014-07-09 21:47:36,571 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[ContinuousScheduling,5,main] threw an Exception.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
at java.lang.Thread.run(Thread.java:744)
{code}

A few cycles later YARN was crippled. The RM was running and jobs could be 
submitted but containers were not assigned and no progress was made. Restarting 
the RM resolved it.

  was:
One DN experienced memory errors and entered a cycle of rebooting and rejoining 
the cluster. After the second time the node went away, the RM produced this:
{code}
2014-07-09 21:47:36,571 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Application attempt appattempt_1404858438119_4352_01 released container 
container_1404858438119_4352_01_04 on node: host: 
node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
2014-07-09 21:47:36,571 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
memory:335872, vCores:328
2014-07-09 21:47:36,571 ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[ContinuousScheduling,5,main] threw an Exception.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
at java.util.TimSort.sort(TimSort.java:203)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
at java.lang.Thread.run(Thread.java:744)
{code}

A few minutes later YARN was crippled. The RM was running and jobs could be 
submitted but containers were not assigned and no progress was made. Restarting 
the RM resolved it.

Summary: NPE in ContinuousScheduling Thread crippled RM after DN flap  
(was: Flapping node caused NPE in FairScheduler)

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0
 Environment: cdh5.0.2 

[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-10 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057893#comment-14057893
 ] 

Robert Kanter commented on YARN-2131:
-

Makes sense to me.  I'll do an addendum patch to rename the command and use 
multi operation.

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057890#comment-14057890
 ] 

Hudson commented on YARN-2181:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5861 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5861/])
YARN-2181. Added preemption info to logs and RM web UI. Contributed by Wangda 
Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609561)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/MockRMApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java


 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container 

[jira] [Commented] (YARN-2273) NPE in ContinuousScheduling Thread crippled RM after DN flap

2014-07-10 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057930#comment-14057930
 ] 

Wei Yan commented on YARN-2273:
---

Thanks for the catch, [~skeltoac].

A quick guess is that the NodeAvailableResourceComparator doesn't check whether 
the node is alive when does comparison. A node may be removed during the 
sorting process. I'll re-check it.

 NPE in ContinuousScheduling Thread crippled RM after DN flap
 

 Key: YARN-2273
 URL: https://issues.apache.org/jira/browse/YARN-2273
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler, resourcemanager
Affects Versions: 2.3.0
 Environment: cdh5.0.2 wheezy
Reporter: Andy Skelton

 One DN experienced memory errors and entered a cycle of rebooting and 
 rejoining the cluster. After the second time the node went away, the RM 
 produced this:
 {code}
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Application attempt appattempt_1404858438119_4352_01 released container 
 container_1404858438119_4352_01_04 on node: host: 
 node-A16-R09-19.hadoop.dfw.wordpress.com:8041 #containers=0 
 available=memory:8192, vCores:8 used=memory:0, vCores:0 with event: KILL
 2014-07-09 21:47:36,571 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Removed node node-A16-R09-19.hadoop.dfw.wordpress.com:8041 cluster capacity: 
 memory:335872, vCores:328
 2014-07-09 21:47:36,571 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[ContinuousScheduling,5,main] threw an Exception.
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1044)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$NodeAvailableResourceComparator.compare(FairScheduler.java:1040)
   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:329)
   at java.util.TimSort.sort(TimSort.java:203)
   at java.util.TimSort.sort(TimSort.java:173)
   at java.util.Arrays.sort(Arrays.java:659)
   at java.util.Collections.sort(Collections.java:217)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousScheduling(FairScheduler.java:1012)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.access$600(FairScheduler.java:124)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$2.run(FairScheduler.java:1306)
   at java.lang.Thread.run(Thread.java:744)
 {code}
 A few cycles later YARN was crippled. The RM was running and jobs could be 
 submitted but containers were not assigned and no progress was made. 
 Restarting the RM resolved it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-2274:
--

 Summary: FairScheduler: Add debug information about cluster 
capacity, availability and reservations
 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


FairScheduler logs have little information on cluster capacity and 
availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2131) Add a way to format the RMStateStore

2014-07-10 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2131:


Attachment: YARN-2131_addendum.patch

The addendum patch renames the command.  

However, I was looking into making the ZK change, and I'm not sure it makes 
sense to do that.  To build up the list of delete Ops, we need to get all of 
the children, and there's no get _all_ children call; so we have to 
recursively do this ourselves.  And we can't use another list of Ops for this 
because it's a discovery operation.  That is, if the structure looks like 
this:
{noformat}
- A
  | - B
  | - C
{noformat}
given that we start off only knowing A, we can't know that C exists until we 
know that B exists; and these each require a call to ZK.  
Because we already have to recursively call ZK to discover the nodes to delete, 
we may as well delete them at the same time, right?


Also, I agree with Karthik's earlier comment that it would be good to 
eventually replace all of the ZooKeeper code with Curator code.  It handles 
most if not all of the connection stuff, provides useful convenience methods, 
and implements a lot of useful recipes (e.g. leader latch, locks, etc).  We've 
been using Curator extensively for Oozie HA.

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2274:
---

Priority: Trivial  (was: Major)

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial

 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2274:
---

Issue Type: Improvement  (was: Bug)

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial

 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2274:
---

Attachment: yarn-2274-1.patch

Reviewers - please feel free to suggest logging any other basic information. 

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-2274-1.patch


 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-07-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058009#comment-14058009
 ] 

Sandy Ryza commented on YARN-2026:
--

I think Ashwin makes a good point.

I think displaying both is reasonable if we present it in a careful way.  For 
example, it might make sense to add tooltips that explain the difference.

 Fair scheduler : Fair share for inactive queues causes unfair allocation in 
 some scenarios
 --

 Key: YARN-2026
 URL: https://issues.apache.org/jira/browse/YARN-2026
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt


 Problem1- While using hierarchical queues in fair scheduler,there are few 
 scenarios where we have seen a leaf queue with least fair share can take 
 majority of the cluster and starve a sibling parent queue which has greater 
 weight/fair share and preemption doesn’t kick in to reclaim resources.
 The root cause seems to be that fair share of a parent queue is distributed 
 to all its children irrespective of whether its an active or an inactive(no 
 apps running) queue. Preemption based on fair share kicks in only if the 
 usage of a queue is less than 50% of its fair share and if it has demands 
 greater than that. When there are many queues under a parent queue(with high 
 fair share),the child queue’s fair share becomes really low. As a result when 
 only few of these child queues have apps running,they reach their *tiny* fair 
 share quickly and preemption doesn’t happen even if other leaf 
 queues(non-sibling) are hogging the cluster.
 This can be solved by dividing fair share of parent queue only to active 
 child queues.
 Here is an example describing the problem and proposed solution:
 root.lowPriorityQueue is a leaf queue with weight 2
 root.HighPriorityQueue is parent queue with weight 8
 root.HighPriorityQueue has 10 child leaf queues : 
 root.HighPriorityQueue.childQ(1..10)
 Above config,results in root.HighPriorityQueue having 80% fair share
 and each of its ten child queue would have 8% fair share. Preemption would 
 happen only if the child queue is 4% (0.5*8=4). 
 Lets say at the moment no apps are running in any of the 
 root.HighPriorityQueue.childQ(1..10) and few apps are running in 
 root.lowPriorityQueue which is taking up 95% of the cluster.
 Up till this point,the behavior of FS is correct.
 Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
 of the cluster. It would get only the available 5% in the cluster and 
 preemption wouldn't kick in since its above 4%(half fair share).This is bad 
 considering childQ1 is under a highPriority parent queue which has *80% fair 
 share*.
 Until root.lowPriorityQueue starts relinquishing containers,we would see the 
 following allocation on the scheduler page:
 *root.lowPriorityQueue = 95%*
 *root.HighPriorityQueue.childQ1=5%*
 This can be solved by distributing a parent’s fair share only to active 
 queues.
 So in the example above,since childQ1 is the only active queue
 under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
 80%.
 This would cause preemption to reclaim the 30% needed by childQ1 from 
 root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
 Problem2 - Also note that similar situation can happen between 
 root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
 hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
 at 5%,until childQ2 starts relinquishing containers. We would like each of 
 childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
 40%,which would ensure childQ1 gets upto 40% resource if needed through 
 preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058050#comment-14058050
 ] 

Hudson commented on YARN-2204:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5862 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5862/])
YARN-2224. Fix CHANGES.txt. This was committed as YARN-2204 before. (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609582)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
 ---

 Key: YARN-2204
 URL: https://issues.apache.org/jira/browse/YARN-2204
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Trivial
 Fix For: 2.5.0

 Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
 YARN-2204_addendum.patch


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2224) Explicitly enable vmem check in TestContainersMonitor#testContainerKillOnMemoryOverflow

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058049#comment-14058049
 ] 

Hudson commented on YARN-2224:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5862 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5862/])
YARN-2224. Fix CHANGES.txt. This was committed as YARN-2204 before. (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609582)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


 Explicitly enable vmem check in 
 TestContainersMonitor#testContainerKillOnMemoryOverflow
 ---

 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Trivial
  Labels: newbie
 Fix For: 2.5.0

 Attachments: YARN-2224.patch


 If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
 will fail. Make the test pass not rely on the default settings but just let 
 it verify that once the setting is turned on it actually does the memory 
 check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-07-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058056#comment-14058056
 ] 

Jian He commented on YARN-2088:
---

patch looks good, committing

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058066#comment-14058066
 ] 

Hadoop QA commented on YARN-2274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655089/yarn-2274-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4262//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4262//console

This message is automatically generated.

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-2274-1.patch


 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2208:


Attachment: YARN-2208.7.patch

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058072#comment-14058072
 ] 

Hudson commented on YARN-2088:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5863 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5863/])
YARN-2088. Fixed a bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder. 
Contributed by Binglin Chang (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609584)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/GetApplicationsRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestGetApplicationsRequest.java


 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.5.0

 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to format the RMStateStore

2014-07-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058068#comment-14058068
 ] 

Jian He commented on YARN-2131:
---

bq. there's no get all children call;
I see. 

I didn't have practice with Curator, make sense to have it if it makes code 
cleaner and simpler.

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058069#comment-14058069
 ] 

Xuan Gong commented on YARN-2208:
-

bq. 1. Maybe change passwords to use concurrentHashMap and use read/write lock 
guard nextMasterKey/currentMasterKey for better concurrency as this is a chatty 
class.

DONE

bq. 2. Put In the same line

DONE

bq. 3. Info level for easier debugging, while stabilizing this feature.

DONE

bq. 4. createAndGetAMRMToken, add info log here also.

ADDED

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts

2014-07-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2244:


Attachment: YARN-2244.002.patch

The build seemed to fail for hdfs native generation unrelated to this patch. 
Uploading same one again to retrigger build

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, YARN-2244.002.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058085#comment-14058085
 ] 

Sandy Ryza commented on YARN-2274:
--

Demanded resources could also be a useful statistic to report.  The update 
thread typically runs twice every second, so it might make sense to 5th update 
or something to avoid a flood of messages.

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-2274-1.patch


 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058086#comment-14058086
 ] 

Wangda Tan commented on YARN-2181:
--

Thanks Jian and Vinod for review and commit!

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2269) External links need to be removed from YARN UI

2014-07-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2269:


Assignee: Craig Welch

 External links need to be removed from YARN UI
 --

 Key: YARN-2269
 URL: https://issues.apache.org/jira/browse/YARN-2269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Craig Welch
  Labels: security
 Attachments: YARN-2269.0.patch


 Accessing external link from YARN UI can disclose delegation parameter to 3rd 
 party in secure cluster. Thus, All external links must be deleted from Yarn 
 Web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2269) External links need to be removed from YARN UI

2014-07-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058114#comment-14058114
 ] 

Xuan Gong commented on YARN-2269:
-

+1 LGTM. 
Committed to trunk and branch-2. Thanks Craig !

 External links need to be removed from YARN UI
 --

 Key: YARN-2269
 URL: https://issues.apache.org/jira/browse/YARN-2269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Craig Welch
  Labels: security
 Fix For: 2.5.0

 Attachments: YARN-2269.0.patch


 Accessing external link from YARN UI can disclose delegation parameter to 3rd 
 party in secure cluster. Thus, All external links must be deleted from Yarn 
 Web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2269) External links need to be removed from YARN UI

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058120#comment-14058120
 ] 

Hudson commented on YARN-2269:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5865 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5865/])
YARN-2269. Remove external links from YARN UI. Contributed by Craig Welch 
(xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1609590)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/FooterBlock.java


 External links need to be removed from YARN UI
 --

 Key: YARN-2269
 URL: https://issues.apache.org/jira/browse/YARN-2269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Craig Welch
  Labels: security
 Fix For: 2.5.0

 Attachments: YARN-2269.0.patch


 Accessing external link from YARN UI can disclose delegation parameter to 3rd 
 party in secure cluster. Thus, All external links must be deleted from Yarn 
 Web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058123#comment-14058123
 ] 

Hadoop QA commented on YARN-2208:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655103/YARN-2208.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.util.TestFSDownload

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4263//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4263//console

This message is automatically generated.

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1366:
--

Target Version/s: 2.6.0  (was: 2.5.0)

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Fix For: 2.5.0

 Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
 YARN-1366.11.patch, YARN-1366.12.patch, YARN-1366.13.patch, 
 YARN-1366.2.patch, YARN-1366.3.patch, YARN-1366.4.patch, YARN-1366.5.patch, 
 YARN-1366.6.patch, YARN-1366.7.patch, YARN-1366.8.patch, YARN-1366.9.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-07-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2088:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.6.0

 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2208) AMRMTokenManager need to have a way to roll over AMRMToken

2014-07-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058124#comment-14058124
 ] 

Xuan Gong commented on YARN-2208:
-

The testcase failure : org.apache.hadoop.yarn.util.TestFSDownload is un-related

 AMRMTokenManager need to have a way to roll over AMRMToken
 --

 Key: YARN-2208
 URL: https://issues.apache.org/jira/browse/YARN-2208
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2208.1.patch, YARN-2208.2.patch, YARN-2208.3.patch, 
 YARN-2208.4.patch, YARN-2208.5.patch, YARN-2208.5.patch, YARN-2208.6.patch, 
 YARN-2208.7.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2088) Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder

2014-07-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2088:
--

Target Version/s: 2.6.0  (was: 2.5.0)

 Fix code bug in GetApplicationsRequestPBImpl#mergeLocalToBuilder
 

 Key: YARN-2088
 URL: https://issues.apache.org/jira/browse/YARN-2088
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.6.0

 Attachments: YARN-2088.v1.patch


 Some fields(set,list) are added to proto builders many times, we need to 
 clear those fields before add, otherwise the result proto contains more 
 contents.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2269) External links need to be removed from YARN UI

2014-07-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2269:


Fix Version/s: (was: 2.5.0)
   2.6.0

 External links need to be removed from YARN UI
 --

 Key: YARN-2269
 URL: https://issues.apache.org/jira/browse/YARN-2269
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Craig Welch
  Labels: security
 Fix For: 2.6.0

 Attachments: YARN-2269.0.patch


 Accessing external link from YARN UI can disclose delegation parameter to 3rd 
 party in secure cluster. Thus, All external links must be deleted from Yarn 
 Web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2244) FairScheduler missing handling of containers for unknown application attempts

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058128#comment-14058128
 ] 

Hadoop QA commented on YARN-2244:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655104/YARN-2244.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4264//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4264//console

This message is automatically generated.

 FairScheduler missing handling of containers for unknown application attempts 
 --

 Key: YARN-2244
 URL: https://issues.apache.org/jira/browse/YARN-2244
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
Priority: Critical
 Attachments: YARN-2224.patch, YARN-2244.001.patch, YARN-2244.002.patch


 We are missing changes in patch MAPREDUCE-3596 in FairScheduler. Among other 
 fixes that were common across schedulers, there were some scheduler specific 
 fixes added to handle containers for unknown application attempts. Without 
 these fair scheduler simply logs that an unknown container was found and 
 continues to let it run. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058131#comment-14058131
 ] 

Chris Nauroth commented on YARN-2181:
-

I'm seeing a compilation error on branch-2 that appears to be related to this 
patch.  (See below.)  I think YARN-2022 would need to get merged to branch-2 to 
resolve this.  I'll comment over there too.

{code}
ERROR] 
/Users/chris/svn/hadoop-common-branch-2/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
 cannot find symbol
[ERROR] symbol  : method isAMContainer()
[ERROR] location: interface 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainer
{code}


 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2263) CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for nested MapReduce jobs

2014-07-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058132#comment-14058132
 ] 

Jason Lowe commented on YARN-2263:
--

1 is an appropriate lower bound since we don't ever want the maximum number of 
applications for a user to be zero or less.  (That would be a worthless queue 
since we could submit jobs to it but no jobs would activate.) 

I'm assuming it only causes a deadlock in the case where the active job submits 
and waits for the completion of other jobs?  If it simply submits jobs and 
exits then even if the queue is so tiny that only 1 active job per user is 
allowed then the jobs should eventually complete (assuming sufficient resources 
to launch an AM _and_ at least one task simultaneously if this is MapReduce).

If the concern is that the queue can be too small to allow running more than 
one application simultaneously for a user and some app frameworks might not 
like that, then yes that could be an issue.  However I'm not sure that is 
YARN's problem to solve.  I could have an application framework that for 
whatever reason requires 10 jobs to be running simultaneously to work.  There 
could definitely be a queue config that will not allow that to run properly 
because the queue is too small to support 10 simultaneous applications by a 
single user.  Should YARN handle this scenario?  If so, how would it detect it, 
and what should it do to mitigate it?  I would argue the same applies to the 
simpler job-launching-job-and-waiting scenario.  Some queues are going to be 
too small to support that.

Users can work around issues like this with smarter queue setups.  This is 
touched upon in MAPREDUCE-4304 and elsewhere for the Oozie case which is a 
similar scenario.  We can setup a separate queue for the launcher jobs separate 
from a queue where the other jobs run.  That way we can't accidentally fill the 
cluster/queue with just launcher jobs and deadlock.

 CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for 
 nested MapReduce jobs
 -

 Key: YARN-2263
 URL: https://issues.apache.org/jira/browse/YARN-2263
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.4.1
Reporter: Chen He

 computeMaxActiveApplicationsPerUser() has a lower bound 1. For a nested 
 MapReduce job which files new mapreduce jobs in its mapper/reducer, it will 
 cause job stuck.
 public static int computeMaxActiveApplicationsPerUser(
   int maxActiveApplications, int userLimit, float userLimitFactor) {
 return Math.max(
 (int)Math.ceil(
 maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),
 1);
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058133#comment-14058133
 ] 

Chris Nauroth commented on YARN-2022:
-

Does this still need to be merged to branch-2?  YARN-2181 was just committed to 
branch-2.  It depends on the new {{RMContainer#isAMContainer}} method, so I'm 
seeing a compilation error on branch-2 now.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2263) CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for nested MapReduce jobs

2014-07-10 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He resolved YARN-2263.
---

Resolution: Won't Fix

 CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for 
 nested MapReduce jobs
 -

 Key: YARN-2263
 URL: https://issues.apache.org/jira/browse/YARN-2263
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.4.1
Reporter: Chen He

 computeMaxActiveApplicationsPerUser() has a lower bound 1. For a nested 
 MapReduce job which files new mapreduce jobs in its mapper/reducer, it will 
 cause job stuck.
 public static int computeMaxActiveApplicationsPerUser(
   int maxActiveApplications, int userLimit, float userLimitFactor) {
 return Math.max(
 (int)Math.ceil(
 maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),
 1);
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2263) CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for nested MapReduce jobs

2014-07-10 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058137#comment-14058137
 ] 

Chen He commented on YARN-2263:
---

Thank you for the comments. Jason Lowe. I will close it.

 CSQueueUtils.computeMaxActiveApplicationsPerUser may cause deadlock for 
 nested MapReduce jobs
 -

 Key: YARN-2263
 URL: https://issues.apache.org/jira/browse/YARN-2263
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 0.23.10, 2.4.1
Reporter: Chen He

 computeMaxActiveApplicationsPerUser() has a lower bound 1. For a nested 
 MapReduce job which files new mapreduce jobs in its mapper/reducer, it will 
 cause job stuck.
 public static int computeMaxActiveApplicationsPerUser(
   int maxActiveApplications, int userLimit, float userLimitFactor) {
 return Math.max(
 (int)Math.ceil(
 maxActiveApplications * (userLimit / 100.0f) * userLimitFactor),
 1);
   }



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-07-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1367:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 After restart NM should resync with the RM without killing containers
 -

 Key: YARN-1367
 URL: https://issues.apache.org/jira/browse/YARN-1367
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Fix For: 2.6.0

 Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
 YARN-1367.003.patch, YARN-1367.prototype.patch


 After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
  Upon receiving the resync response, the NM kills all containers and 
 re-registers with the RM. The NM should be changed to not kill the container 
 and instead inform the RM about all currently running containers including 
 their allocations etc. After the re-register, the NM should send all pending 
 container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2181:
--

Fix Version/s: (was: 2.5.0)
   2.6.0

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-611) Add an AM retry count reset window to YARN RM

2014-07-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-611:
---

Attachment: YARN-611.4.patch

 Add an AM retry count reset window to YARN RM
 -

 Key: YARN-611
 URL: https://issues.apache.org/jira/browse/YARN-611
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Chris Riccomini
Assignee: Xuan Gong
 Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, 
 YARN-611.4.patch


 YARN currently has the following config:
 yarn.resourcemanager.am.max-retries
 This config defaults to 2, and defines how many times to retry a failed AM 
 before failing the whole YARN job. YARN counts an AM as failed if the node 
 that it was running on dies (the NM will timeout, which counts as a failure 
 for the AM), or if the AM dies.
 This configuration is insufficient for long running (or infinitely running) 
 YARN jobs, since the machine (or NM) that the AM is running on will 
 eventually need to be restarted (or the machine/NM will fail). In such an 
 event, the AM has not done anything wrong, but this is counted as a failure 
 by the RM. Since the retry count for the AM is never reset, eventually, at 
 some point, the number of machine/NM failures will result in the AM failure 
 count going above the configured value for 
 yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the 
 job as failed, and shut it down. This behavior is not ideal.
 I propose that we add a second configuration:
 yarn.resourcemanager.am.retry-count-window-ms
 This configuration would define a window of time that would define when an AM 
 is well behaved, and it's safe to reset its failure count back to zero. 
 Every time an AM fails the RmAppImpl would check the last time that the AM 
 failed. If the last failure was less than retry-count-window-ms ago, and the 
 new failure count is  max-retries, then the job should fail. If the AM has 
 never failed, the retry count is  max-retries, or if the last failure was 
 OUTSIDE the retry-count-window-ms, then the job should be restarted. 
 Additionally, if the last failure was outside the retry-count-window-ms, then 
 the failure count should be set back to 0.
 This would give developers a way to have well-behaved AMs run forever, while 
 still failing mis-behaving AMs after a short period of time.
 I think the work to be done here is to change the RmAppImpl to actually look 
 at app.attempts, and see if there have been more than max-retries failures in 
 the last retry-count-window-ms milliseconds. If there have, then the job 
 should fail, if not, then the job should go forward. Additionally, we might 
 also need to add an endTime in either RMAppAttemptImpl or 
 RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the 
 failure.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

2014-07-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058161#comment-14058161
 ] 

Xuan Gong commented on YARN-611:


create new patch based on vinod's suggestion. Also move all logics about how to 
decide wether this is the last attempt from RMApp to ApplicationRetryPolicy.

 Add an AM retry count reset window to YARN RM
 -

 Key: YARN-611
 URL: https://issues.apache.org/jira/browse/YARN-611
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Chris Riccomini
Assignee: Xuan Gong
 Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, 
 YARN-611.4.patch


 YARN currently has the following config:
 yarn.resourcemanager.am.max-retries
 This config defaults to 2, and defines how many times to retry a failed AM 
 before failing the whole YARN job. YARN counts an AM as failed if the node 
 that it was running on dies (the NM will timeout, which counts as a failure 
 for the AM), or if the AM dies.
 This configuration is insufficient for long running (or infinitely running) 
 YARN jobs, since the machine (or NM) that the AM is running on will 
 eventually need to be restarted (or the machine/NM will fail). In such an 
 event, the AM has not done anything wrong, but this is counted as a failure 
 by the RM. Since the retry count for the AM is never reset, eventually, at 
 some point, the number of machine/NM failures will result in the AM failure 
 count going above the configured value for 
 yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the 
 job as failed, and shut it down. This behavior is not ideal.
 I propose that we add a second configuration:
 yarn.resourcemanager.am.retry-count-window-ms
 This configuration would define a window of time that would define when an AM 
 is well behaved, and it's safe to reset its failure count back to zero. 
 Every time an AM fails the RmAppImpl would check the last time that the AM 
 failed. If the last failure was less than retry-count-window-ms ago, and the 
 new failure count is  max-retries, then the job should fail. If the AM has 
 never failed, the retry count is  max-retries, or if the last failure was 
 OUTSIDE the retry-count-window-ms, then the job should be restarted. 
 Additionally, if the last failure was outside the retry-count-window-ms, then 
 the failure count should be set back to 0.
 This would give developers a way to have well-behaved AMs run forever, while 
 still failing mis-behaving AMs after a short period of time.
 I think the work to be done here is to change the RmAppImpl to actually look 
 at app.attempts, and see if there have been more than max-retries failures in 
 the last retry-count-window-ms milliseconds. If there have, then the job 
 should fail, if not, then the job should go forward. Additionally, we might 
 also need to add an endTime in either RMAppAttemptImpl or 
 RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the 
 failure.
 Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Moved] (YARN-2275) When log aggregation not enabled, message should point to NM HTTP port, not IPC port

2014-07-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved MAPREDUCE-5185 to YARN-2275:
--

  Component/s: (was: jobhistoryserver)
   log-aggregation
Affects Version/s: (was: 2.0.4-alpha)
   2.0.4-alpha
  Key: YARN-2275  (was: MAPREDUCE-5185)
  Project: Hadoop YARN  (was: Hadoop Map/Reduce)

 When log aggregation not enabled, message should point to NM HTTP port, not 
 IPC port 
 -

 Key: YARN-2275
 URL: https://issues.apache.org/jira/browse/YARN-2275
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE5185-01.patch


 When I try to get a container's logs in the JHS without log aggregation 
 enabled, I get a message that looks like this:
 Aggregation is not enabled. Try the nodemanager at sandy-ThinkPad-T530:33224
 This could be a lot more helpful by actually pointing the URL that would show 
 the container logs on the NM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2275) When log aggregation not enabled, message should point to NM HTTP port, not IPC port

2014-07-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058163#comment-14058163
 ] 

Vinod Kumar Vavilapalli commented on YARN-2275:
---

AggregatedLogsBlock is hosted on a server that is not the nodemanager - today 
the MR JobHistoryServer and the TimelineServer in the near future. So you 
cannot look into the config.

 When log aggregation not enabled, message should point to NM HTTP port, not 
 IPC port 
 -

 Key: YARN-2275
 URL: https://issues.apache.org/jira/browse/YARN-2275
 Project: Hadoop YARN
  Issue Type: Bug
  Components: log-aggregation
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE5185-01.patch


 When I try to get a container's logs in the JHS without log aggregation 
 enabled, I get a message that looks like this:
 Aggregation is not enabled. Try the nodemanager at sandy-ThinkPad-T530:33224
 This could be a lot more helpful by actually pointing the URL that would show 
 the container logs on the NM.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page

2014-07-10 Thread Garth Goodson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058176#comment-14058176
 ] 

Garth Goodson commented on YARN-2238:
-

We have the same issue and it is very annoying for our users.  If the front 
page is being filtered, it should be able to be cleared from that page.

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
 Attachments: filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (YARN-2131) Add a way to format the RMStateStore

2014-07-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened YARN-2131:
---


Reopening for the addendum..

One other thing that occurred to me was running RM while the format is in 
progress or vice-versa. Namenode solves this issue by a lock file. We can do 
the same here.

Irrespective of the approach, I think handling the above is a major blocker for 
this feature/patch. Let's try to do that here too..

 Add a way to format the RMStateStore
 

 Key: YARN-2131
 URL: https://issues.apache.org/jira/browse/YARN-2131
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Fix For: 2.6.0

 Attachments: YARN-2131.patch, YARN-2131.patch, 
 YARN-2131_addendum.patch


 There are cases when we don't want to recover past applications, but recover 
 applications going forward. To do this, one has to clear the store. Today, 
 there is no easy way to do this and users should understand how each store 
 works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2259) NM-Local dir cleanup failing when Resourcemanager switches

2014-07-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058200#comment-14058200
 ] 

Jason Lowe commented on YARN-2259:
--

This sounds like the NM wasn't notified of the application completing and 
therefore didn't process the cleanup.  Possibly a duplicate of YARN-1421?

 NM-Local dir cleanup failing when Resourcemanager switches
 --

 Key: YARN-2259
 URL: https://issues.apache.org/jira/browse/YARN-2259
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
 Environment: 
Reporter: Nishan Shetty
 Attachments: Capture.PNG


 Induce RM switchover while job is in progress
 Observe that NM-Local dir cleanup failing when Resourcemanager switches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1421) Node managers will not receive application finish event where containers ran before RM restart

2014-07-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058201#comment-14058201
 ] 

Jason Lowe commented on YARN-1421:
--

Was this fixed by YARN-1885?

 Node managers will not receive application finish event where containers ran 
 before RM restart
 --

 Key: YARN-1421
 URL: https://issues.apache.org/jira/browse/YARN-1421
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Critical

 Problem :- Today for every application we track the node managers where 
 containers ran. So when application finishes it notifies all those node 
 managers about application finish event (via node manager heartbeat). However 
 if rm restarts then we forget this past information and those node managers 
 will never get application finish event and will keep reporting finished 
 applications.
 Proposed Solution :- Instead of remembering the node managers where 
 containers ran for this particular application it would be better if we 
 depend on node manager heartbeat to take this decision. i.e. when node 
 manager heartbeats saying it is running application (app1, app2) then we 
 should check those application's status in RM's memory 
 {code}rmContext.getRMApps(){code} and if either they are not found (very old 
 applications) or they are in their final state (FINISHED, KILLED, FAILED) 
 then we should immediately notify the node manager about the application 
 finish event. By doing this we are reducing the state which we need to store 
 at RM after restart.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2274:
---

Attachment: yarn-2274-2.patch

Thanks Sandy. Updated patch includes demand and skips a few updates before 
spitting out debug info. 

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-2274-1.patch, yarn-2274-2.patch


 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Fengdong Yu (JIRA)
Fengdong Yu created YARN-2276:
-

 Summary: Branch-2 cannot build
 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu


[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/home/yufengdong/svn/letv-hadoop/hadoop-2.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
 error: cannot find symbol




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Fengdong Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated YARN-2276:
--

Description: 
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
 error: cannot find symbol


  was:
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/home/yufengdong/svn/letv-hadoop/hadoop-2.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
 error: cannot find symbol



 Branch-2 cannot build
 -

 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu

 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
  error: cannot find symbol



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058245#comment-14058245
 ] 

Zhijie Shen commented on YARN-2276:
---

It's related to 
[YARN-2181|https://issues.apache.org/jira/browse/YARN-2181?focusedCommentId=14058131page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14058131]

 Branch-2 cannot build
 -

 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu

 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
  error: cannot find symbol



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2276.
---

Resolution: Fixed
  Assignee: Zhijie Shen

Merged YARN-2022 into branch-2.

 Branch-2 cannot build
 -

 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu
Assignee: Zhijie Shen

 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
  error: cannot find symbol



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058271#comment-14058271
 ] 

Zhijie Shen commented on YARN-2022:
---

I merged YARN-2022 to branch-2, and the compilation error was gone.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Fix For: 2.5.0

 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2274) FairScheduler: Add debug information about cluster capacity, availability and reservations

2014-07-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058275#comment-14058275
 ] 

Hadoop QA commented on YARN-2274:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655131/yarn-2274-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4266//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4266//console

This message is automatically generated.

 FairScheduler: Add debug information about cluster capacity, availability and 
 reservations
 --

 Key: YARN-2274
 URL: https://issues.apache.org/jira/browse/YARN-2274
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 2.4.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-2274-1.patch, yarn-2274-2.patch


 FairScheduler logs have little information on cluster capacity and 
 availability. Need this information to debug production issues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2276:
--

Assignee: (was: Zhijie Shen)

 Branch-2 cannot build
 -

 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu

 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
  error: cannot find symbol



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058289#comment-14058289
 ] 

Tsuyoshi OZAWA commented on YARN-2181:
--

Hi Chris, good catch. I confirmed that branch-2 can be compiled and pass all 
tests by applying YARN-2022.10.patch.

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058291#comment-14058291
 ] 

Tsuyoshi OZAWA commented on YARN-2276:
--

As Chris mentioned on YARN-2181, we need to merge YARN-2022 into branch-2 to 
compile. If we take this problem on YARN-2181, we can close this JIRA as 
duplicated one.

 Branch-2 cannot build
 -

 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu

 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
  error: cannot find symbol



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058293#comment-14058293
 ] 

Wangda Tan commented on YARN-2181:
--

Hi [~ozawa],
Zhijie has already committed YARN-2022 to branch-2. You can update and try.
Thanks,
Wangda

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2276) Branch-2 cannot build

2014-07-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058294#comment-14058294
 ] 

Tsuyoshi OZAWA commented on YARN-2276:
--

Zhijie, thanks you for your work!

 Branch-2 cannot build
 -

 Key: YARN-2276
 URL: https://issues.apache.org/jira/browse/YARN-2276
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Fengdong Yu

 [ERROR] COMPILATION ERROR : 
 [INFO] -
 [ERROR] 
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptMetrics.java:[61,18]
  error: cannot find symbol



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI and add logs when preemption occurs

2014-07-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058295#comment-14058295
 ] 

Jian He commented on YARN-2181:
---

 [~zjshen] merged YARN-2022 to branch-2. This issue should be solved

 Add preemption info to RM Web UI and add logs when preemption occurs
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page-1.png, 
 application page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 
 And RM logs should have following properties:
 * Logs are retrievable when an application is still running and often flushed.
 * Can distinguish between AM container preemption and task container 
 preemption with container ID shown.
 * Should be INFO level log.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2277) Add JSONP support to the ATS REST API

2014-07-10 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-2277:
-

 Summary: Add JSONP support to the ATS REST API
 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles


As the Application Timeline Server is provided with built-in UI, it may make 
sense to enable JSONP Rest API capabilities to allow for remote UI to access 
the data directly via javascript without cross side server browser blocks 
coming into play.

Example client may be like
http://api.jquery.com/jQuery.getJSON/ 

This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2277) Add JSONP support to the ATS REST API

2014-07-10 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2277:
--

Attachment: YARN-2277.patch

Starter patch for conversation starter

 Add JSONP support to the ATS REST API
 -

 Key: YARN-2277
 URL: https://issues.apache.org/jira/browse/YARN-2277
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
 Attachments: YARN-2277.patch


 As the Application Timeline Server is provided with built-in UI, it may make 
 sense to enable JSONP Rest API capabilities to allow for remote UI to access 
 the data directly via javascript without cross side server browser blocks 
 coming into play.
 Example client may be like
 http://api.jquery.com/jQuery.getJSON/ 
 This can alleviate the need to create a local proxy cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >