[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-07-03 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust .patch

> Add one service to check the nodes' TRUST status 
> -
>
> Key: YARN-2142
> URL: https://issues.apache.org/jira/browse/YARN-2142
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager, scheduler, webapp
> Environment: OS:Ubuntu 13.04; 
> JAVA:OpenJDK 7u51-2.4.4-0
> Only in branch-2.2.0.
>Reporter: anders
>Priority: Minor
>  Labels: features
> Attachments: trust .patch, trust.patch, trust.patch, trust.patch, 
> trust001.patch, trust002.patch, trust003.patch, trust2.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Because of critical computing environment ,we must test every node's TRUST 
> status in the cluster (We can get the TRUST status by the API of OAT 
> sever),So I add this feature into hadoop's schedule .
> By the TRUST check service ,node can get the TRUST status of itself,
> then through the heartbeat ,send the TRUST status to resource manager for 
> scheduling.
> In the scheduling step,if the node's TRUST status is 'false', it will be 
> abandoned until it's TRUST status turn to 'true'.
> ***The logic of this feature is similar to node's health checkservice.
> ***Only in branch-2.2.0 , not in trunk***



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052136#comment-14052136
 ] 

Hadoop QA commented on YARN-2045:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654030/YARN-2045.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4199//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4199//console

This message is automatically generated.

> Data persisted in NM should be versioned
> 
>
> Key: YARN-2045
> URL: https://issues.apache.org/jira/browse/YARN-2045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related 
> data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2045) Data persisted in NM should be versioned

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052131#comment-14052131
 ] 

Hadoop QA commented on YARN-2045:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654030/YARN-2045.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4198//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4198//console

This message is automatically generated.

> Data persisted in NM should be versioned
> 
>
> Key: YARN-2045
> URL: https://issues.apache.org/jira/browse/YARN-2045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related 
> data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052119#comment-14052119
 ] 

Hadoop QA commented on YARN-1408:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654025/Yarn-1408.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4197//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4197//console

This message is automatically generated.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2045) Data persisted in NM should be versioned

2014-07-03 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2045:
-

Attachment: YARN-2045.patch

Upload a patch for this jira, in this patch:
- replace DB_SCHEMA_VERSION from a string with NMStateVersion which has 
majorVersion and minorVersion
- check compatibility in NMStateStore start
- add tests for compatible/incompatible version changes.

> Data persisted in NM should be versioned
> 
>
> Key: YARN-2045
> URL: https://issues.apache.org/jira/browse/YARN-2045
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2045.patch
>
>
> As a split task from YARN-667, we want to add version info to NM related 
> data, include:
> - NodeManager local LevelDB state
> - NodeManager directory structure



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-03 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052110#comment-14052110
 ] 

Wangda Tan commented on YARN-1408:
--

[~sunilg],
Thanks update, just two minor comments,

1) Can we put RMContainerImpl.setResourceRequests to RMContainer to prevent 
type cast such as
{code}
((RMContainerImpl)rmContainer).setResourceRequests(resourceRequestList);
{code}

2) For TestCapacityScheduler and TestFairScheduler,
Can we verify after recovered, does RR in ASI contain \{node, rack, any\}?

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14052102#comment-14052102
 ] 

Hadoop QA commented on YARN-2181:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12654022/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4196//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4196//console

This message is automatically generated.

> Add preemption info to RM Web UI
> 
>
> Key: YARN-2181
> URL: https://issues.apache.org/jira/browse/YARN-2181
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> application page.png, queue page.png
>
>
> We need add preemption info to RM web page to make administrator/user get 
> more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-07-03 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.7.patch


Thank you [~leftnoteasy] for the comments.

1.
bq.I would suggest to modify existing appSchedulingInfo.allocate to return list 
of RRs.
Yes, this is better. I have modified appSchedulingInfo.allocate to get the list 
RRs. 

2. 
bq.make it parametrized for Fair/Capacity/FIFO.
I could see the test cases of Fair uses extending few TestBase class and 
working on that, not clearly uses MockRM.
So I kept a separate test case for Fair for now. Pls share your thoughts on 
same.

I have updated a patch and please help to review the same.

> Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
> timeout for 30mins
> --
>
> Key: YARN-1408
> URL: https://issues.apache.org/jira/browse/YARN-1408
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
> Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.7.patch, 
> Yarn-1408.patch
>
>
> Capacity preemption is enabled as follows.
>  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
>  *  
> yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
> Queue = a,b
> Capacity of Queue A = 80%
> Capacity of Queue B = 20%
> Step 1: Assign a big jobA on queue a which uses full cluster capacity
> Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
> capacity
> JobA task which uses queue b capcity is been preempted and killed.
> This caused below problem:
> 1. New Container has got allocated for jobA in Queue A as per node update 
> from an NM.
> 2. This container has been preempted immediately as per preemption.
> Here ACQUIRED at KILLED Invalid State exception came when the next AM 
> heartbeat reached RM.
> ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ACQUIRED at KILLED
> This also caused the Task to go for a timeout for 30minutes as this Container 
> was already killed by preemption.
> attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI

2014-07-03 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2181:
-

Attachment: YARN-2181.patch

> Add preemption info to RM Web UI
> 
>
> Key: YARN-2181
> URL: https://issues.apache.org/jira/browse/YARN-2181
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
> application page.png, queue page.png
>
>
> We need add preemption info to RM web page to make administrator/user get 
> more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051969#comment-14051969
 ] 

Hadoop QA commented on YARN-2250:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653990/YARN-2250-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4195//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4195//console

This message is automatically generated.

> FairScheduler.findLowestCommonAncestorQueue returns null when queues not 
> identical
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch, YARN-2250-2.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-07-03 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051966#comment-14051966
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

This javac warning is introduced by marking {{newContainerId}} method without 
epoch as Deprecated. [~jianhe], could you review it?

> Making ContainerId long type
> 
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
> YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-07-03 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051954#comment-14051954
 ] 

Ashwin Shankar commented on YARN-2026:
--

[~sandyr], did you have any comments ?

> Fair scheduler : Fair share for inactive queues causes unfair allocation in 
> some scenarios
> --
>
> Key: YARN-2026
> URL: https://issues.apache.org/jira/browse/YARN-2026
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt
>
>
> Problem1- While using hierarchical queues in fair scheduler,there are few 
> scenarios where we have seen a leaf queue with least fair share can take 
> majority of the cluster and starve a sibling parent queue which has greater 
> weight/fair share and preemption doesn’t kick in to reclaim resources.
> The root cause seems to be that fair share of a parent queue is distributed 
> to all its children irrespective of whether its an active or an inactive(no 
> apps running) queue. Preemption based on fair share kicks in only if the 
> usage of a queue is less than 50% of its fair share and if it has demands 
> greater than that. When there are many queues under a parent queue(with high 
> fair share),the child queue’s fair share becomes really low. As a result when 
> only few of these child queues have apps running,they reach their *tiny* fair 
> share quickly and preemption doesn’t happen even if other leaf 
> queues(non-sibling) are hogging the cluster.
> This can be solved by dividing fair share of parent queue only to active 
> child queues.
> Here is an example describing the problem and proposed solution:
> root.lowPriorityQueue is a leaf queue with weight 2
> root.HighPriorityQueue is parent queue with weight 8
> root.HighPriorityQueue has 10 child leaf queues : 
> root.HighPriorityQueue.childQ(1..10)
> Above config,results in root.HighPriorityQueue having 80% fair share
> and each of its ten child queue would have 8% fair share. Preemption would 
> happen only if the child queue is <4% (0.5*8=4). 
> Lets say at the moment no apps are running in any of the 
> root.HighPriorityQueue.childQ(1..10) and few apps are running in 
> root.lowPriorityQueue which is taking up 95% of the cluster.
> Up till this point,the behavior of FS is correct.
> Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
> of the cluster. It would get only the available 5% in the cluster and 
> preemption wouldn't kick in since its above 4%(half fair share).This is bad 
> considering childQ1 is under a highPriority parent queue which has *80% fair 
> share*.
> Until root.lowPriorityQueue starts relinquishing containers,we would see the 
> following allocation on the scheduler page:
> *root.lowPriorityQueue = 95%*
> *root.HighPriorityQueue.childQ1=5%*
> This can be solved by distributing a parent’s fair share only to active 
> queues.
> So in the example above,since childQ1 is the only active queue
> under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
> 80%.
> This would cause preemption to reclaim the 30% needed by childQ1 from 
> root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
> Problem2 - Also note that similar situation can happen between 
> root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
> hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
> at 5%,until childQ2 starts relinquishing containers. We would like each of 
> childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
> 40%,which would ensure childQ1 gets upto 40% resource if needed through 
> preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical

2014-07-03 Thread Krisztian Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051918#comment-14051918
 ] 

Krisztian Horvath commented on YARN-2250:
-

Indeed, made changes accordingly. Thanks.

> FairScheduler.findLowestCommonAncestorQueue returns null when queues not 
> identical
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch, YARN-2250-2.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2250) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical

2014-07-03 Thread Krisztian Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Horvath updated YARN-2250:


Attachment: YARN-2250-2.patch

> FairScheduler.findLowestCommonAncestorQueue returns null when queues not 
> identical
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch, YARN-2250-2.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to nuke the RMStateStore

2014-07-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051866#comment-14051866
 ] 

Sangjin Lee commented on YARN-2131:
---

LGTM. I'd let committers chime in though.

> Add a way to nuke the RMStateStore
> --
>
> Key: YARN-2131
> URL: https://issues.apache.org/jira/browse/YARN-2131
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2131.patch, YARN-2131.patch
>
>
> There are cases when we don't want to recover past applications, but recover 
> applications going forward. To do this, one has to clear the store. Today, 
> there is no easy way to do this and users should understand how each store 
> works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to nuke the RMStateStore

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051845#comment-14051845
 ] 

Hadoop QA commented on YARN-2131:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653972/YARN-2131.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4194//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4194//console

This message is automatically generated.

> Add a way to nuke the RMStateStore
> --
>
> Key: YARN-2131
> URL: https://issues.apache.org/jira/browse/YARN-2131
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2131.patch, YARN-2131.patch
>
>
> There are cases when we don't want to recover past applications, but recover 
> applications going forward. To do this, one has to clear the store. Today, 
> there is no easy way to do this and users should understand how each store 
> works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2131) Add a way to nuke the RMStateStore

2014-07-03 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2131:


Attachment: YARN-2131.patch

Good point.  I've uploaded a new patch that adds documentation 

> Add a way to nuke the RMStateStore
> --
>
> Key: YARN-2131
> URL: https://issues.apache.org/jira/browse/YARN-2131
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2131.patch, YARN-2131.patch
>
>
> There are cases when we don't want to recover past applications, but recover 
> applications going forward. To do this, one has to clear the store. Today, 
> there is no easy way to do this and users should understand how each store 
> works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051793#comment-14051793
 ] 

Hadoop QA commented on YARN-1367:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653970/YARN-1367.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4193//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4193//console

This message is automatically generated.

> After restart NM should resync with the RM without killing containers
> -
>
> Key: YARN-1367
> URL: https://issues.apache.org/jira/browse/YARN-1367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
> YARN-1367.003.patch, YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-07-03 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-1367:


Attachment: YARN-1367.003.patch

> After restart NM should resync with the RM without killing containers
> -
>
> Key: YARN-1367
> URL: https://issues.apache.org/jira/browse/YARN-1367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
> YARN-1367.003.patch, YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-07-03 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051769#comment-14051769
 ] 

Anubhav Dhoot commented on YARN-1367:
-

I was not seeing RM shoot unknown containers because of YARN-2244
Changing it to use config as before


> After restart NM should resync with the RM without killing containers
> -
>
> Key: YARN-1367
> URL: https://issues.apache.org/jira/browse/YARN-1367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Anubhav Dhoot
> Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
> YARN-1367.prototype.patch
>
>
> After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
>  Upon receiving the resync response, the NM kills all containers and 
> re-registers with the RM. The NM should be changed to not kill the container 
> and instead inform the RM about all currently running containers including 
> their allocations etc. After the re-register, the NM should send all pending 
> container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2250) FairScheduler.findLowestCommonAncestorQueue returns null when queues not identical

2014-07-03 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2250:
-

Summary: FairScheduler.findLowestCommonAncestorQueue returns null when 
queues not identical  (was: Moving apps between queues - FairScheduler)

> FairScheduler.findLowestCommonAncestorQueue returns null when queues not 
> identical
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051692#comment-14051692
 ] 

Sandy Ryza commented on YARN-2250:
--

I think the bug can be fixed by replacing name1.substring(lastPeriodIndex) with 
name1.substring(0, lastPeriodIndex).  I tried this out and all your tests 
passed.

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051601#comment-14051601
 ] 

Hadoop QA commented on YARN-2228:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653852/YARN-2228.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4192//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4192//console

This message is automatically generated.

> TimelineServer should load pseudo authentication filter when authentication = 
> simple
> 
>
> Key: YARN-2228
> URL: https://issues.apache.org/jira/browse/YARN-2228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2228.1.patch, YARN-2228.2.patch
>
>
> When kerberos authentication is not enabled, we should let the timeline 
> server to work with pseudo authentication filter. In this way, the sever is 
> able to detect the request user by checking "user.name".
> On the other hand, timeline client should append "user.name" in un-secure 
> case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2131) Add a way to nuke the RMStateStore

2014-07-03 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051584#comment-14051584
 ] 

Sangjin Lee commented on YARN-2131:
---

Looks good to me. Just curious, does this need to be documented in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/YarnCommands.apt.vm?

> Add a way to nuke the RMStateStore
> --
>
> Key: YARN-2131
> URL: https://issues.apache.org/jira/browse/YARN-2131
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Robert Kanter
> Attachments: YARN-2131.patch
>
>
> There are cases when we don't want to recover past applications, but recover 
> applications going forward. To do this, one has to clear the store. Today, 
> there is no easy way to do this and users should understand how each store 
> works.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051562#comment-14051562
 ] 

Hudson commented on YARN-2232:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/])
YARN-2232. Fixed ResourceManager to allow DelegationToken owners to be able to 
cancel their own tokens in secure mode. Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607484)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


> ClientRMService doesn't allow delegation token owner to cancel their own 
> token in secure mode
> -
>
> Key: YARN-2232
> URL: https://issues.apache.org/jira/browse/YARN-2232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch, 
> apache-yarn-2232.2.patch
>
>
> The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
> tokens. The root cause is this piece of code from the cancelDelegationToken 
> function -
> {noformat}
> String user = getRenewerForToken(token);
> ...
> private String getRenewerForToken(Token token) 
> throws IOException {
>   UserGroupInformation user = UserGroupInformation.getCurrentUser();
>   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
>   // we can always renew our own tokens
>   return loginUser.getUserName().equals(user.getUserName())
>   ? token.decodeIdentifier().getRenewer().toString()
>   : user.getShortUserName();
> }
> {noformat}
> It ends up passing the user short name to the cancelToken function whereas 
> AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
> This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051569#comment-14051569
 ] 

Hudson commented on YARN-2022:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/])
YARN-2022. Fixing CHANGES.txt to be correctly placed. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607486)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Preempting an Application Master container can be kept as least priority when 
> multiple applications are marked for preemption by 
> ProportionalCapacityPreemptionPolicy
> -
>
> Key: YARN-2022
> URL: https://issues.apache.org/jira/browse/YARN-2022
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.5.0
>
> Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
> YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
> YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
> Yarn-2022.1.patch
>
>
> Cluster Size = 16GB [2NM's]
> Queue A Capacity = 50%
> Queue B Capacity = 50%
> Consider there are 3 applications running in Queue A which has taken the full 
> cluster capacity. 
> J1 = 2GB AM + 1GB * 4 Maps
> J2 = 2GB AM + 1GB * 4 Maps
> J3 = 2GB AM + 1GB * 2 Maps
> Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
> Currently in this scenario, Jobs J3 will get killed including its AM.
> It is better if AM can be given least priority among multiple applications. 
> In this same scenario, map tasks from J3 and J2 can be preempted.
> Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2241) ZKRMStateStore: On startup, show nicer messages if znodes already exist

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051572#comment-14051572
 ] 

Hudson commented on YARN-2241:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/])
YARN-2241. ZKRMStateStore: On startup, show nicer messages if znodes already 
exist. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607473)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


> ZKRMStateStore: On startup, show nicer messages if znodes already exist
> ---
>
> Key: YARN-2241
> URL: https://issues.apache.org/jira/browse/YARN-2241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: YARN-2241.patch, YARN-2241.patch
>
>
> When using the RMZKStateStore, if you restart the RM, you get a bunch of 
> stack traces with messages like 
> {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /rmstore}}.  This is expected as these nodes already exist 
> from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051567#comment-14051567
 ] 

Hudson commented on YARN-2065:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1820 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1820/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


> AM cannot create new containers after restart-NM token from previous attempt 
> used
> -
>
> Key: YARN-2065
> URL: https://issues.apache.org/jira/browse/YARN-2065
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Jian He
> Fix For: 2.5.0
>
> Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
> YARN-2065.1.patch
>
>
> Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
> create new containers.
> The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
> kills the AM, then kills a container while the AM is down, which triggers a 
> reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-07-03 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051556#comment-14051556
 ] 

Zhijie Shen commented on YARN-2233:
---

+1 LGTM. [~vinodkv], do you want to have a second look at this blocker issue?

> Implement web services to create, renew and cancel delegation tokens
> 
>
> Key: YARN-2233
> URL: https://issues.apache.org/jira/browse/YARN-2233
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Attachments: apache-yarn-2233.0.patch, apache-yarn-2233.1.patch, 
> apache-yarn-2233.2.patch
>
>
> Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2228) TimelineServer should load pseudo authentication filter when authentication = simple

2014-07-03 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2228:
--

Attachment: YARN-2228.2.patch

bq. Can you split up the individual tests so that the conditions are easier to 
understand? Something like

Nice refactoring suggestion. Thanks!

bq. Maybe we should just use the hadoop.http.authentication.* instead of a new 
subset?

I used to choose have timeline prefix configuration names to prevent affecting 
other components. Before we making http authentication for RM and timeline, web 
HDFS is the only component that is using the feature to support Oozie. Now if 
we keep using these configures in core-site.xml, all the three daemons are 
going to have same settings (unless we prepare different core-site.xml), while 
I think it should be good to allow flexible configurations for each individual 
daemons.

> TimelineServer should load pseudo authentication filter when authentication = 
> simple
> 
>
> Key: YARN-2228
> URL: https://issues.apache.org/jira/browse/YARN-2228
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-2228.1.patch, YARN-2228.2.patch
>
>
> When kerberos authentication is not enabled, we should let the timeline 
> server to work with pseudo authentication filter. In this way, the sever is 
> able to detect the request user by checking "user.name".
> On the other hand, timeline client should append "user.name" in un-secure 
> case as well, such that ACLs can keep working in this case. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2242) Improve exception information on AM launch crashes

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051503#comment-14051503
 ] 

Hudson commented on YARN-2242:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5821 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5821/])
YARN-2242. Improve exception information on AM launch crashes. (Contributed by 
Li Lu) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607655)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java


> Improve exception information on AM launch crashes
> --
>
> Key: YARN-2242
> URL: https://issues.apache.org/jira/browse/YARN-2242
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Fix For: 2.6.0
>
> Attachments: YARN-2242-070114-1.patch, YARN-2242-070114.patch, 
> YARN-2242-070115-1.patch, YARN-2242-070115-2.patch, YARN-2242-070115.patch
>
>
> Now on each time AM Container crashes during launch, both the console and the 
> webpage UI only report a ShellExitCodeExecption. This is not only unhelpful, 
> but sometimes confusing. With the help of log aggregator, container logs are 
> actually aggregated, and can be very helpful for debugging. One possible way 
> to improve the whole process is to send a "pointer" to the aggregated logs to 
> the programmer when reporting exception information. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2241) ZKRMStateStore: On startup, show nicer messages if znodes already exist

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051476#comment-14051476
 ] 

Hudson commented on YARN-2241:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/])
YARN-2241. ZKRMStateStore: On startup, show nicer messages if znodes already 
exist. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607473)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


> ZKRMStateStore: On startup, show nicer messages if znodes already exist
> ---
>
> Key: YARN-2241
> URL: https://issues.apache.org/jira/browse/YARN-2241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: YARN-2241.patch, YARN-2241.patch
>
>
> When using the RMZKStateStore, if you restart the RM, you get a bunch of 
> stack traces with messages like 
> {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /rmstore}}.  This is expected as these nodes already exist 
> from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051467#comment-14051467
 ] 

Hudson commented on YARN-2232:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/])
YARN-2232. Fixed ResourceManager to allow DelegationToken owners to be able to 
cancel their own tokens in secure mode. Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607484)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


> ClientRMService doesn't allow delegation token owner to cancel their own 
> token in secure mode
> -
>
> Key: YARN-2232
> URL: https://issues.apache.org/jira/browse/YARN-2232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch, 
> apache-yarn-2232.2.patch
>
>
> The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
> tokens. The root cause is this piece of code from the cancelDelegationToken 
> function -
> {noformat}
> String user = getRenewerForToken(token);
> ...
> private String getRenewerForToken(Token token) 
> throws IOException {
>   UserGroupInformation user = UserGroupInformation.getCurrentUser();
>   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
>   // we can always renew our own tokens
>   return loginUser.getUserName().equals(user.getUserName())
>   ? token.decodeIdentifier().getRenewer().toString()
>   : user.getShortUserName();
> }
> {noformat}
> It ends up passing the user short name to the cancelToken function whereas 
> AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
> This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051472#comment-14051472
 ] 

Hudson commented on YARN-2065:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


> AM cannot create new containers after restart-NM token from previous attempt 
> used
> -
>
> Key: YARN-2065
> URL: https://issues.apache.org/jira/browse/YARN-2065
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Jian He
> Fix For: 2.5.0
>
> Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
> YARN-2065.1.patch
>
>
> Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
> create new containers.
> The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
> kills the AM, then kills a container while the AM is down, which triggers a 
> reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051474#comment-14051474
 ] 

Hudson commented on YARN-2022:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1793 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1793/])
YARN-2022. Fixing CHANGES.txt to be correctly placed. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607486)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Preempting an Application Master container can be kept as least priority when 
> multiple applications are marked for preemption by 
> ProportionalCapacityPreemptionPolicy
> -
>
> Key: YARN-2022
> URL: https://issues.apache.org/jira/browse/YARN-2022
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.5.0
>
> Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
> YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
> YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
> Yarn-2022.1.patch
>
>
> Cluster Size = 16GB [2NM's]
> Queue A Capacity = 50%
> Queue B Capacity = 50%
> Consider there are 3 applications running in Queue A which has taken the full 
> cluster capacity. 
> J1 = 2GB AM + 1GB * 4 Maps
> J2 = 2GB AM + 1GB * 4 Maps
> J3 = 2GB AM + 1GB * 2 Maps
> Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
> Currently in this scenario, Jobs J3 will get killed including its AM.
> It is better if AM can be given least priority among multiple applications. 
> In this same scenario, map tasks from J3 and J2 can be preempted.
> Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051335#comment-14051335
 ] 

Hudson commented on YARN-2232:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/602/])
YARN-2232. Fixed ResourceManager to allow DelegationToken owners to be able to 
cancel their own tokens in secure mode. Contributed by Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607484)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMService.java


> ClientRMService doesn't allow delegation token owner to cancel their own 
> token in secure mode
> -
>
> Key: YARN-2232
> URL: https://issues.apache.org/jira/browse/YARN-2232
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.5.0
>
> Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch, 
> apache-yarn-2232.2.patch
>
>
> The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
> tokens. The root cause is this piece of code from the cancelDelegationToken 
> function -
> {noformat}
> String user = getRenewerForToken(token);
> ...
> private String getRenewerForToken(Token token) 
> throws IOException {
>   UserGroupInformation user = UserGroupInformation.getCurrentUser();
>   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
>   // we can always renew our own tokens
>   return loginUser.getUserName().equals(user.getUserName())
>   ? token.decodeIdentifier().getRenewer().toString()
>   : user.getShortUserName();
> }
> {noformat}
> It ends up passing the user short name to the cancelToken function whereas 
> AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
> This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2065) AM cannot create new containers after restart-NM token from previous attempt used

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051339#comment-14051339
 ] 

Hudson commented on YARN-2065:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/602/])
YARN-2065 AM cannot create new containers after restart (stevel: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607441)
* /hadoop/common/trunk
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestContainerManagerSecurity.java


> AM cannot create new containers after restart-NM token from previous attempt 
> used
> -
>
> Key: YARN-2065
> URL: https://issues.apache.org/jira/browse/YARN-2065
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Steve Loughran
>Assignee: Jian He
> Fix For: 2.5.0
>
> Attachments: YARN-2065-002.patch, YARN-2065-003.patch, 
> YARN-2065.1.patch
>
>
> Slider AM Restart failing (SLIDER-34). The AM comes back up, but it cannot 
> create new containers.
> The Slider minicluster test {{TestKilledAM}} can replicate this reliably -it 
> kills the AM, then kills a container while the AM is down, which triggers a 
> reallocation of a container, leading to this failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2241) ZKRMStateStore: On startup, show nicer messages if znodes already exist

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051343#comment-14051343
 ] 

Hudson commented on YARN-2241:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/602/])
YARN-2241. ZKRMStateStore: On startup, show nicer messages if znodes already 
exist. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607473)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java


> ZKRMStateStore: On startup, show nicer messages if znodes already exist
> ---
>
> Key: YARN-2241
> URL: https://issues.apache.org/jira/browse/YARN-2241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.0
>Reporter: Robert Kanter
>Assignee: Robert Kanter
>Priority: Minor
> Fix For: 2.5.0
>
> Attachments: YARN-2241.patch, YARN-2241.patch
>
>
> When using the RMZKStateStore, if you restart the RM, you get a bunch of 
> stack traces with messages like 
> {{org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /rmstore}}.  This is expected as these nodes already exist 
> from before.  We should catch these and print nicer messages.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-07-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051341#comment-14051341
 ] 

Hudson commented on YARN-2022:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #602 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/602/])
YARN-2022. Fixing CHANGES.txt to be correctly placed. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1607486)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt


> Preempting an Application Master container can be kept as least priority when 
> multiple applications are marked for preemption by 
> ProportionalCapacityPreemptionPolicy
> -
>
> Key: YARN-2022
> URL: https://issues.apache.org/jira/browse/YARN-2022
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Sunil G
>Assignee: Sunil G
> Fix For: 2.5.0
>
> Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
> YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
> YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
> Yarn-2022.1.patch
>
>
> Cluster Size = 16GB [2NM's]
> Queue A Capacity = 50%
> Queue B Capacity = 50%
> Consider there are 3 applications running in Queue A which has taken the full 
> cluster capacity. 
> J1 = 2GB AM + 1GB * 4 Maps
> J2 = 2GB AM + 1GB * 4 Maps
> J3 = 2GB AM + 1GB * 2 Maps
> Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
> Currently in this scenario, Jobs J3 will get killed including its AM.
> It is better if AM can be given least priority among multiple applications. 
> In this same scenario, map tasks from J3 and J2 can be preempted.
> Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051299#comment-14051299
 ] 

Hadoop QA commented on YARN-2250:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653793/YARN-2250-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4191//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4191//console

This message is automatically generated.

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Krisztian Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051241#comment-14051241
 ] 

Krisztian Horvath commented on YARN-2250:
-

Do I need to create a review request for this?

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Krisztian Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Horvath updated YARN-2250:


Attachment: YARN-2250-1.patch

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
> Attachments: YARN-2250-1.patch
>
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051218#comment-14051218
 ] 

Hadoop QA commented on YARN-2229:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653777/YARN-2229.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1305 javac 
compiler warnings (more than the trunk's current 1258 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4190//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4190//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4190//console

This message is automatically generated.

> Making ContainerId long type
> 
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
> YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Krisztian Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051217#comment-14051217
 ] 

Krisztian Horvath commented on YARN-2250:
-

I'm preparing the patch for this and you can check that.

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Krisztian Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051192#comment-14051192
 ] 

Krisztian Horvath commented on YARN-2250:
-

Hi Sandy,

The problem is not that the metrics are not updated, but the 
findLowestCommonAncestorQueue method always returns null causing the update 
always go upwards till the root.

root.queue1.a
root.queue1.b

common should be queue1, but returning null, and goes upward till root

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051169#comment-14051169
 ] 

Sandy Ryza commented on YARN-2250:
--

Hi Krisztian,
Would you mind including an example of a situation where the metrics become off?

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-2250:
-

Target Version/s: 2.6.0
   Fix Version/s: (was: 3.0.0)

> Moving apps between queues - FairScheduler
> --
>
> Key: YARN-2250
> URL: https://issues.apache.org/jira/browse/YARN-2250
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>
> We need to update the queue metrics until to lowest common ancestor of the 
> target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2250) Moving apps between queues - FairScheduler

2014-07-03 Thread Krisztian Horvath (JIRA)
Krisztian Horvath created YARN-2250:
---

 Summary: Moving apps between queues - FairScheduler
 Key: YARN-2250
 URL: https://issues.apache.org/jira/browse/YARN-2250
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.1, 2.4.0
Reporter: Krisztian Horvath
 Fix For: 3.0.0


We need to update the queue metrics until to lowest common ancestor of the 
target and source queue. This method fails to retrieve the right queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) Making ContainerId long type

2014-07-03 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229.5.patch

Fixed the warnings by javac and findbugs.

> Making ContainerId long type
> 
>
> Key: YARN-2229
> URL: https://issues.apache.org/jira/browse/YARN-2229
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2229.1.patch, YARN-2229.2.patch, YARN-2229.2.patch, 
> YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch
>
>
> On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
> lower 22 bits are for sequence number of Ids. This is for preserving 
> semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
> {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
> {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
> restarts 1024 times.
> To avoid the problem, its better to make containerId long. We need to define 
> the new format of container Id with preserving backward compatibility on this 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051141#comment-14051141
 ] 

Hadoop QA commented on YARN-1366:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653772/YARN-1366.12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4189//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4189//console

This message is automatically generated.

> AM should implement Resync with the ApplicationMasterService instead of 
> shutting down
> -
>
> Key: YARN-1366
> URL: https://issues.apache.org/jira/browse/YARN-1366
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Bikas Saha
>Assignee: Rohith
> Attachments: YARN-1366.1.patch, YARN-1366.10.patch, 
> YARN-1366.11.patch, YARN-1366.12.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
> YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
> YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, 
> YARN-1366.prototype.patch, YARN-1366.prototype.patch
>
>
> The ApplicationMasterService currently sends a resync response to which the 
> AM responds by shutting down. The AM behavior is expected to change to 
> calling resyncing with the RM. Resync means resetting the allocate RPC 
> sequence number to 0 and the AM should send its entire outstanding request to 
> the RM. Note that if the AM is making its first allocate call to the RM then 
> things should proceed like normal without needing a resync. The RM will 
> return all containers that have completed since the RM last synced with the 
> AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)