[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust003.patch

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047412#comment-14047412
 ] 

Rohith commented on YARN-1366:
--

Thank you for reviewing patch. I will update patch soon. 

One update on the comment,
bq.testAMRMClientResendsRequestsOnRMRestart seems not testing re-sending 
pendingReleases across RM restart, because the pending releases seems already 
decremented to zero before restart happens
The test does verification of pending release too. Before restart , 1st 
container is  released. Below code releases only 1 container. One approach is 
to make test stronger, number of container can be asserted before/after this 
code.
{noformat}
int pendingRelease = 0;
IteratorContainer it = allocatedContainers.iterator();
while (it.hasNext()) {
  amClient.releaseAssignedContainer(it.next().getId());
  pendingRelease++;
  it.remove();
  break;// remove one container
}
{noformat}

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047413#comment-14047413
 ] 

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653105/trust003.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4139//console

This message is automatically generated.

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047425#comment-14047425
 ] 

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653106/trust2.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4140//console

This message is automatically generated.

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047432#comment-14047432
 ] 

Jian He commented on YARN-1366:
---

bq. The test does verification of pending release too
I see.
- We can move release.addAll(this.pendingRelease); to the first 
isResyncCommand check? similarly, 
blacklistAdditions.addAll(this.blacklistedNodes);
- please add some comments to pendingRelease and release variables

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.patch, YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Environment: 
OS:Ubuntu 13.04; 
JAVA:OpenJDK 7u51-2.4.4-0
Only in branch-2.2.0.

  was:
OS:Ubuntu 13.04; 
JAVA:OpenJDK 7u51-2.4.4-0


 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Description: 
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.



  was:
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.



 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Description: 
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.
***Only in branch-2.2.0 , not in trunk***


  was:
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.
--Only in branch-2.2.0 , not in trunk--



 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.
 ***Only in branch-2.2.0 , not in trunk***



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Description: 
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.
Only in branch-2.2.0 , not in trunk


  was:
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.




 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.
 Only in branch-2.2.0 , not in trunk



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Description: 
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.
--Only in branch-2.2.0 , not in trunk--


  was:
Because of critical computing environment ,we must test every node's TRUST 
status in the cluster (We can get the TRUST status by the API of OAT sever),So 
I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for 
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be 
abandoned until it's TRUST status turn to 'true'.

***The logic of this feature is similar to node's health checkservice.
Only in branch-2.2.0 , not in trunk



 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.
 --Only in branch-2.2.0 , not in trunk--



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-30 Thread anders (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Target Version/s:   (was: 2.2.0)
   Fix Version/s: (was: 2.2.0)

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler, webapp
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
 Only in branch-2.2.0.
Reporter: anders
Priority: Minor
  Labels: patch
 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch, trust002.patch, trust003.patch, trust2.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.
 ***Only in branch-2.2.0 , not in trunk***



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-06-30 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047459#comment-14047459
 ] 

Akira AJISAKA commented on YARN-570:


Thanks [~ozawa] for the comment.
Attaching a patch to change {{renderHadoopDate}} to return almost the same 
format.
Javascript doesn't support z pattern (ex. PDT, JST, ...), so the patch will 
output in Z pattern (ex. -0800, +0900, ...).
In my environment, date is rendered as follows:
{code}
Mon Jun 30 16:48:18 +0900 2014 // EEE MMM dd hh:mm:ss Z 
{code}
I think it's better to render in Java instead of Javascript to make the format 
the same.

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-570) Time strings are formated in different timezone

2014-06-30 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-570:
---

Attachment: YARN-570.3.patch

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, YARN-570.3.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-570) Time strings are formated in different timezone

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047473#comment-14047473
 ] 

Hadoop QA commented on YARN-570:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653110/YARN-570.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4141//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4141//console

This message is automatically generated.

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch, YARN-570.3.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI

2014-06-30 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2181:
-

Attachment: YARN-2181.patch

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, application page.png, queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047506#comment-14047506
 ] 

Hadoop QA commented on YARN-2181:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653118/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4142//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4142//console

This message is automatically generated.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, application page.png, queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047563#comment-14047563
 ] 

Hudson commented on YARN-2052:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #599 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/599/])
YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of 
container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java


 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
 

[jira] [Created] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token

2014-06-30 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-2232:
---

 Summary: ClientRMService doesn't allow delegation token owner to 
cancel their own token
 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch

The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
tokens. The root cause is this piece of code from the cancelDelegationToken 
function -
{noformat}
String user = getRenewerForToken(token);
...

private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
throws IOException {
  UserGroupInformation user = UserGroupInformation.getCurrentUser();
  UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
  // we can always renew our own tokens
  return loginUser.getUserName().equals(user.getUserName())
  ? token.decodeIdentifier().getRenewer().toString()
  : user.getShortUserName();
}
{noformat}

It ends up passing the user short name to the cancelToken function whereas 
AbstractDelegationTokenSecretManager::cancelToken expects the full user name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token

2014-06-30 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2232:


Attachment: apache-yarn-2232.0.patch

Uploaded patch with fix.

 ClientRMService doesn't allow delegation token owner to cancel their own token
 --

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-06-30 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2232:


Summary: ClientRMService doesn't allow delegation token owner to cancel 
their own token in secure mode  (was: ClientRMService doesn't allow delegation 
token owner to cancel their own token)

 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token

2014-06-30 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2232:


Description: 
The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
tokens. The root cause is this piece of code from the cancelDelegationToken 
function -
{noformat}
String user = getRenewerForToken(token);
...

private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
throws IOException {
  UserGroupInformation user = UserGroupInformation.getCurrentUser();
  UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
  // we can always renew our own tokens
  return loginUser.getUserName().equals(user.getUserName())
  ? token.decodeIdentifier().getRenewer().toString()
  : user.getShortUserName();
}
{noformat}

It ends up passing the user short name to the cancelToken function whereas 
AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
This bug occurs in secure mode and is not an issue with simple auth.

  was:
The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
tokens. The root cause is this piece of code from the cancelDelegationToken 
function -
{noformat}
String user = getRenewerForToken(token);
...

private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
throws IOException {
  UserGroupInformation user = UserGroupInformation.getCurrentUser();
  UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
  // we can always renew our own tokens
  return loginUser.getUserName().equals(user.getUserName())
  ? token.decodeIdentifier().getRenewer().toString()
  : user.getShortUserName();
}
{noformat}

It ends up passing the user short name to the cancelToken function whereas 
AbstractDelegationTokenSecretManager::cancelToken expects the full user name.


 ClientRMService doesn't allow delegation token owner to cancel their own token
 --

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-06-30 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-2233:
---

 Summary: Implement web services to create, renew and cancel 
delegation tokens
 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev


Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-06-30 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2233:


Attachment: apache-yarn-2233.0.patch

Uploaded patch.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2233.0.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2233) Implement web services to create, renew and cancel delegation tokens

2014-06-30 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047596#comment-14047596
 ] 

Varun Vasudev commented on YARN-2233:
-

Adding blocker because one test assumes that owners can cancel their own 
delegation tokens. I'll update the patch if YARN-2232 is marked invalid.

 Implement web services to create, renew and cancel delegation tokens
 

 Key: YARN-2233
 URL: https://issues.apache.org/jira/browse/YARN-2233
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2233.0.patch


 Implement functionality to create, renew and cancel delegation tokens.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047599#comment-14047599
 ] 

Hadoop QA commented on YARN-2232:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653139/apache-yarn-2232.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4143//console

This message is automatically generated.

 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1342) Recover container tokens upon nodemanager restart

2014-06-30 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047598#comment-14047598
 ] 

Devaraj K commented on YARN-1342:
-

Unfortunately this patch also has gone stale, could you rebase this. Thanks

 Recover container tokens upon nodemanager restart
 -

 Key: YARN-1342
 URL: https://issues.apache.org/jira/browse/YARN-1342
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1342.patch, YARN-1342v2.patch, 
 YARN-1342v3-and-YARN-1987.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-06-30 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2232:


Attachment: apache-yarn-2232.1.patch

Uploaded new patch removing unnecessary includes which caused compilation error.

 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.6.patch

Thank you very much [~leftnoteasy] for the comments.

bq.ResourceRequest stored in RMContainerImpl should include rack/any RR,
+1. Yes, RackLocal was missed and while recreating there will be problems as 
you mentioned (relaxLocaity).

bq.You can edit appSchedulingInfo.allocate to return a list a RRs
I think we can have a new api in appSchedulingInfo to return list of 
ResourceRequests (node local, rack local and any). 

I updated a patch as per the comments, kindly check and share your opinion.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047655#comment-14047655
 ] 

Hadoop QA commented on YARN-2232:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653148/apache-yarn-2232.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4144//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4144//console

This message is automatically generated.

 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2232) ClientRMService doesn't allow delegation token owner to cancel their own token in secure mode

2014-06-30 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047656#comment-14047656
 ] 

Varun Vasudev commented on YARN-2232:
-

Test failure is unrelated.

 ClientRMService doesn't allow delegation token owner to cancel their own 
 token in secure mode
 -

 Key: YARN-2232
 URL: https://issues.apache.org/jira/browse/YARN-2232
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2232.0.patch, apache-yarn-2232.1.patch


 The ClientRMSerivce doesn't allow delegation token owners to cancel their own 
 tokens. The root cause is this piece of code from the cancelDelegationToken 
 function -
 {noformat}
 String user = getRenewerForToken(token);
 ...
 private String getRenewerForToken(TokenRMDelegationTokenIdentifier token) 
 throws IOException {
   UserGroupInformation user = UserGroupInformation.getCurrentUser();
   UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
   // we can always renew our own tokens
   return loginUser.getUserName().equals(user.getUserName())
   ? token.decodeIdentifier().getRenewer().toString()
   : user.getShortUserName();
 }
 {noformat}
 It ends up passing the user short name to the cancelToken function whereas 
 AbstractDelegationTokenSecretManager::cancelToken expects the full user name. 
 This bug occurs in secure mode and is not an issue with simple auth.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047665#comment-14047665
 ] 

Hudson commented on YARN-2052:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1817 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1817/])
YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of 
container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java


 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi 

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.8.patch

I updated the patch addressing comments.

bq. isApplicationMasterRegistered is actually not an argument, may be throw 
ApplicationMasterNotRegsiteredException in this case ? 
DONE
bq. pom.xml format: use spaces instead of tabs 
DONE
bq. Not related to this jira. Current ApplicationMasterService does not allow 
multiple registers. Application may want to update its tracking url etc. Should 
we make AMS accept multiple registers ? 
DONE
bq. please add some comments to pendingRelease and release variables
DONE

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047686#comment-14047686
 ] 

Hudson commented on YARN-2052:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1790 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1790/])
YARN-2052. Embedded an epoch number in container id to ensure the uniqueness of 
container id after RM restarts. Contributed by Tsuyoshi OZAWA (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606557)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/NullRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/Epoch.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/records/impl/pb/EpochPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestFSRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestMaxRunningAppsEnforcer.java


 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
   

[jira] [Created] (YARN-2234) Incorrect description in RM audit logs while refreshing Admin ACL

2014-06-30 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-2234:
--

 Summary: Incorrect description in RM audit logs while refreshing 
Admin ACL
 Key: YARN-2234
 URL: https://issues.apache.org/jira/browse/YARN-2234
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Varun Saxena


In method, AdminService#refreshAdminAcls(AdminService.java:446), failure RM 
audit log, which is generated when RM is not active, has following description :
  ResourceManager is not active. Can not refresh user-groups.

This should instead be changed to ResourceManager is not active. Can not 
refresh admin ACLs'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047694#comment-14047694
 ] 

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653155/Yarn-1408.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4145//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4145//console

This message is automatically generated.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2235) No success Audit log is present when Refresh Service ACL operation is performed using rmadmin

2014-06-30 Thread Varun Saxena (JIRA)
Varun Saxena created YARN-2235:
--

 Summary: No success Audit log is present when Refresh Service ACL 
operation is performed using rmadmin
 Key: YARN-2235
 URL: https://issues.apache.org/jira/browse/YARN-2235
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Varun Saxena


No success Audit log is present when Refresh Service ACL operation is performed 
using rmadmin.

In AdminService#refreshServiceAcls method, only failure audit log is present.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047718#comment-14047718
 ] 

Hadoop QA commented on YARN-1366:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653162/YARN-1366.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.TestResourceTrackerOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4146//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4146//console

This message is automatically generated.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047755#comment-14047755
 ] 

Wangda Tan commented on YARN-1408:
--

Hi [~sunilg],
Thanks for updating the patch, overall approach LGTM, some comments,

1)
bq. I think we can have a new api in appSchedulingInfo to return list of 
ResourceRequests (node local, rack local and any).
I would suggest to modify existing appSchedulingInfo.allocate to return list of 
RRs. There existed outstanding resource decrement logic in allocate(), we can 
simply add decremented RR to a list and return them. It looks more like 
by-product of ASI.allocate to me.

2)
{code}
if (type.equals(NodeType.NODE_LOCAL)) {
  list.add(nodeRequests.get(hostName));
}
{code}
It's better to clone RR instead of add ref to list. It works, but it's better 
to set a #container correctly and prevent RR changed in ASI in the future.

3) TestCapacityScheduler:
It's good to have a test for FairScheduler here too. I think we can put the 
test to org.apache.hadoop.yarn.server.resourcemanager.scheduler, and make it 
parameterized for Fair/Capacity/FIFO.

Two minor comment for TestCapacityScheduler.
3.1 
{code}
for (ResourceRequest request : requests) {
  // Skip the OffRack and RackLocal resource requests.
  if (request.getResourceName().equals(node.getRackName())
  || request.getResourceName().equals(ResourceRequest.ANY)) {
Assert.assertEquals(request.getNumContainers(), 1);
continue;
  }
  
  // Resource request must have added back in RM after preempt event 
handling.
  Assert.assertNotNull(app.getResourceRequest(request.getPriority(),
request.getResourceName()));
}
{code}
We can make it simpler to,
{code}
for (ResourceRequest request : requests) {
  // Resource request must have added back in RM after preempt event 
handling.
  Assert.assertEquals(1, app.getResourceRequest(request.getPriority(),
request.getResourceName()).getNumContainers());
}
{code}
Because we added them back, there's no difference between node/rack/any.

3.2
{code}
// allocate container
ListContainer containers = am1.allocate(new ArrayListResourceRequest(),
new ArrayListContainerId()).getAllocatedContainers();

{code}
Should we wait for containers allocated in a while loop? This works now because 
previous we called rm1.waitForState(nm1, ...). But it's better to wait 
container allocated explictly 

Thanks,
Wangda

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.6.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2216) TestRMApplicationHistoryWriter sometimes fails in trunk

2014-06-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047839#comment-14047839
 ] 

Xuan Gong commented on YARN-2216:
-

+1 LGTM

 TestRMApplicationHistoryWriter sometimes fails in trunk
 ---

 Key: YARN-2216
 URL: https://issues.apache.org/jira/browse/YARN-2216
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
Priority: Minor
 Attachments: TestRMApplicationHistoryWriter.stack, YARN-2216.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/595/ :
 {code}
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 33.469 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:7156
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2216) TestRMApplicationHistoryWriter sometimes fails in trunk

2014-06-30 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047840#comment-14047840
 ] 

Xuan Gong commented on YARN-2216:
-

Committed to trunk and branch-2. Thanks Zhijie !

 TestRMApplicationHistoryWriter sometimes fails in trunk
 ---

 Key: YARN-2216
 URL: https://issues.apache.org/jira/browse/YARN-2216
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
Priority: Minor
 Attachments: TestRMApplicationHistoryWriter.stack, YARN-2216.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/595/ :
 {code}
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 33.469 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:7156
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-30 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047910#comment-14047910
 ] 

Devaraj K commented on YARN-1341:
-

Sorry for coming late here. 

+1 for limiting the implementation/discussion as per Jira title and handling 
other cases in the respected Jira’s.

In addition to option 1), I'd think of making the NM down if NM fails to store 
RM keys for certain number of times(configurable) consecutively. And also we 
can make it(i.e. tear down NM or not) as configurable and let the users choose 
whether to enable or disable the config to make the NM down for RM keys  state 
store failures.

Similarly for Container/Application state store failures, NM can mark that 
Container/Application as failed and can be reported to RM. These can be 
discussed more detail in the corresponding Jira’s YARN-1337 and YARN-1354. 
However for all these NM state store operations, we could think of having 
retries before throwing the IOException. 

Thoughts?


 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-06-30 Thread Chris Trezzo (JIRA)
Chris Trezzo created YARN-2236:
--

 Summary: Shared Cache uploader service on the Node Manager
 Key: YARN-2236
 URL: https://issues.apache.org/jira/browse/YARN-2236
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo


Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-06-30 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2236:
---

Attachment: YARN-2236-trunk-v1.patch

Attached is a v1 patch on 
trunk+YARN-2179,YARN-2180,YARN-2183,YARN-2186,YARN-2188,YARN-2189,YARN-2203.

 Shared Cache uploader service on the Node Manager
 -

 Key: YARN-2236
 URL: https://issues.apache.org/jira/browse/YARN-2236
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2236-trunk-v1.patch


 Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2217) Shared cache client side changes

2014-06-30 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2217:
---

Attachment: YARN-2217-trunk-v2.patch

Attached is a re-based v2 patch on 
trunk+YARN-2179,YARN-2180,YARN-2183,YARN-2186,YARN-2188,YARN-2189,YARN-2203,YARN-2236.

 Shared cache client side changes
 

 Key: YARN-2217
 URL: https://issues.apache.org/jira/browse/YARN-2217
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2217-trunk-v1.patch, YARN-2217-trunk-v2.patch


 Implement the client side changes for the shared cache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.9.patch

Thank you [~vinodkv] for the comments.
bq.The other option that we may do is to skip the config completely and 
hard-code the skipping.
Yes. I also feel we can hardcode for now.
bq.RMContainer usually is a read only interface. setMasterContainer() doesn't 
belong here, please move it to RMContainerImpl.
I moved setMasterContainer to RMContainerImpl. In this case, if I need to 
invoke setMasterContainer then I may need to raise an Event to RMContainerImpl 
(downcast may not be good). Please suggest your opinion.

I also fixed other comments. Kindly review the updated patch.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-30 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047946#comment-14047946
 ] 

Daryn Sharp commented on YARN-2147:
---

Code looks fine.  Currently the test verifies the stringified token is in the 
exception's message.  However since the mock is throwing an exception 
explicitly with the stringified token, we don't know if the code change is 
actually catching and adding the token.  The mock should throw a generic string 
of say, boom.  Then check the caught exception against something like Failed 
to renew token: token: boom.

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-06-30 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:


Attachment: YARN-2190.1.patch

Here is a new patch that addressed all the comments above.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047962#comment-14047962
 ] 

Jian He commented on YARN-1366:
---

Thanks for updating, some more comments:
- “blacklistRemovals.addAll(blacklistToRemove);”, we don't need to add this in 
isResyncCommand check? as RM after restart will just forget all previously 
blacklisted nodes.
- below code needs synchronize ?
{code}
for (MapString, TreeMapResource, ResourceRequestInfo rr : 
remoteRequestsTable
.values()) {
  for (MapResource, ResourceRequestInfo capabalities : rr.values()) {
for (ResourceRequestInfo request : capabalities.values()) {
  addResourceRequestToAsk(request.remoteRequest);
}
  }
}
{code}
- “isApplicationMasterRegistered = false;” not needed in allocate and 
unregisterApplicationMaster.
- Instead of adding a new core-site.xml file, we can just set the config in the 
test code conf object.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-06-30 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-2190:


Attachment: YARN-2190.2.patch

Slightly update with some indentation fixes.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048010#comment-14048010
 ] 

Vinod Kumar Vavilapalli commented on YARN-2022:
---

bq. I moved setMasterContainer to RMContainerImpl. In this case, if I need to 
invoke setMasterContainer then I may need to raise an Event to RMContainerImpl 
(downcast may not be good). Please suggest your opinion.
It's okay to do a cast. The main focus of the reader interfaces is to hide 
readers from the implementations and not accidentally invoke methods that 
mutate state (as opposed to enabling alternative implementations).

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2216) TestRMApplicationHistoryWriter sometimes fails in trunk

2014-06-30 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2216:
--

Fix Version/s: 2.5.0

 TestRMApplicationHistoryWriter sometimes fails in trunk
 ---

 Key: YARN-2216
 URL: https://issues.apache.org/jira/browse/YARN-2216
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Assignee: Zhijie Shen
Priority: Minor
 Fix For: 2.5.0

 Attachments: TestRMApplicationHistoryWriter.stack, YARN-2216.1.patch


 From https://builds.apache.org/job/Hadoop-Yarn-trunk/595/ :
 {code}
 testRMWritingMassiveHistory(org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter)
   Time elapsed: 33.469 sec   FAILURE!
 java.lang.AssertionError: expected:1 but was:7156
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at org.junit.Assert.assertEquals(Assert.java:542)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:430)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter.testRMWritingMassiveHistory(TestRMApplicationHistoryWriter.java:391)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048016#comment-14048016
 ] 

Hadoop QA commented on YARN-2190:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653201/YARN-2190.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4147//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4147//console

This message is automatically generated.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2190) Provide a Windows container executor that can limit memory and CPU

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048051#comment-14048051
 ] 

Hadoop QA commented on YARN-2190:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653206/YARN-2190.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  org.apache.hadoop.ha.TestZKFailoverControllerStress

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4148//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4148//console

This message is automatically generated.

 Provide a Windows container executor that can limit memory and CPU
 --

 Key: YARN-2190
 URL: https://issues.apache.org/jira/browse/YARN-2190
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Reporter: Chuan Liu
Assignee: Chuan Liu
 Attachments: YARN-2190-prototype.patch, YARN-2190.1.patch, 
 YARN-2190.2.patch


 Yarn default container executor on Windows does not set the resource limit on 
 the containers currently. The memory limit is enforced by a separate 
 monitoring thread. The container implementation on Windows uses Job Object 
 right now. The latest Windows (8 or later) API allows CPU and memory limits 
 on the job objects. We want to create a Windows container executor that sets 
 the limits on job objects thus provides resource enforcement at OS level.
 http://msdn.microsoft.com/en-us/library/windows/desktop/ms686216(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: (was: YARN-2022.9.patch)

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048079#comment-14048079
 ] 

Jian He commented on YARN-1366:
---

- we may check pendingRelease isEmpty as well to avoid unnecessary loops.
{code}
if (!allocateResponse.getCompletedContainersStatuses().isEmpty()) {
  removePendingReleaseRequests(allocateResponse
  .getCompletedContainersStatuses());
}
{code}

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.9.patch

Thank you [~vinodkv]. I updated the patch as per the comment. Please review.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-30 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2147:
--

Attachment: YARN-2147-v5.patch

Thank you [~daryn]. I updated patch as you suggested.

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-30 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048118#comment-14048118
 ] 

Varun Vasudev commented on YARN-2147:
-

The patch looks mostly good. Agree with [~daryn] on testing the message. 
Question about the timeout - it doesn't look like the test needs a timeout, any 
particular reason why you added it?

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-06-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048129#comment-14048129
 ] 

Tsuyoshi OZAWA commented on YARN-2229:
--

I took some time to think about the new format. IMHO, if it's necessary for us 
to assure backward compatibility, we need to use current format and deal with 
overflow. Current idea is as follows:
* ContainerId will have epoch as 64 bit field based on the value stored on 
ZKRMStateStore.
* {{ContainerId#compareTo}}, {{ContainerId#equals}} use the epoch field to deal 
with the overflow.
* {{ContainerId#toString}} will show the 64 bit epoch as suffix of current 
container id. {{ConverterUtils#toContainerId}} will be updated to parse epoch.

64 bits are used like this:
{code}
|54bits|10bits|
|0|0| // inital value 
|0|1023| // before overflow
|1|1024==0| // overflowed. toString shows only lower 10 bits for backward 
compatibility.
|1|1|
|2|1024==0|
{code}

Old {{ConverterUtil#toContainerId}} can still parse the new format of 
{{ContainerId#toString}}. One problem is that the result of old 
{{ConverterUtil#toContainerId}} cannot care the overflow of epoch. [~jianhe], 
[~bikassaha], what do you think?

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048170#comment-14048170
 ] 

Hadoop QA commented on YARN-2022:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653224/YARN-2022.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4149//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4149//console

This message is automatically generated.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048180#comment-14048180
 ] 

Hadoop QA commented on YARN-2147:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653227/YARN-2147-v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4150//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4150//console

This message is automatically generated.

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147-v3.patch, 
 YARN-2147-v4.patch, YARN-2147-v5.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service

2014-06-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048185#comment-14048185
 ] 

Vinod Kumar Vavilapalli commented on YARN-1713:
---

bq. AppSubmissionContextInfo - AppSubmissionSubmissionContextInfo
Sorry, typo. Meant the full-name ApplicationSubmissionContextInfo..

 Implement getnewapplication and submitapp as part of RM web service
 ---

 Key: YARN-1713
 URL: https://issues.apache.org/jira/browse/YARN-1713
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-1713.3.patch, apache-yarn-1713.4.patch, 
 apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, apache-yarn-1713.7.patch, 
 apache-yarn-1713.8.patch, apache-yarn-1713.cumulative.2.patch, 
 apache-yarn-1713.cumulative.3.patch, apache-yarn-1713.cumulative.4.patch, 
 apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-06-30 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048208#comment-14048208
 ] 

Anubhav Dhoot commented on YARN-1367:
-

I had it that way but after discussion it seemed like depending on config might 
make it cumbersome. I am worried about what happens when we have a mismatch 
between RM and NM. For example if NM does not kill containers (setting on) and 
RM is not expecting containers to be preserved (Setting off). Then the 
containers could be running without RM accounting for them.

 After restart NM should resync with the RM without killing containers
 -

 Key: YARN-1367
 URL: https://issues.apache.org/jira/browse/YARN-1367
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
 YARN-1367.prototype.patch


 After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
  Upon receiving the resync response, the NM kills all containers and 
 re-registers with the RM. The NM should be changed to not kill the container 
 and instead inform the RM about all currently running containers including 
 their allocations etc. After the re-register, the NM should send all pending 
 container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048216#comment-14048216
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

Thank you for the review and comments, Jian, Vinod, and Bikas!

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.5.0

 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-30 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048225#comment-14048225
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

I'm planning to define epoch format on YARN-2229 at first and change toString 
behavior on YARN-2182.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.5.0

 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.11.patch, YARN-2052.12.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2237) MRAppMaster changes for AMRMToken roll-up

2014-06-30 Thread Xuan Gong (JIRA)
Xuan Gong created YARN-2237:
---

 Summary: MRAppMaster changes for AMRMToken roll-up
 Key: YARN-2237
 URL: https://issues.apache.org/jira/browse/YARN-2237
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-06-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048291#comment-14048291
 ] 

Jian He commented on YARN-1367:
---

In that case, RM should still be able to shoot unknown containers. 
I think the point is that in the future we are only supporting work-preserving 
restart and the newly added command will be useless at that point. This config 
is only a temporary solution for testing and stabilizing.

 After restart NM should resync with the RM without killing containers
 -

 Key: YARN-1367
 URL: https://issues.apache.org/jira/browse/YARN-1367
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1367.001.patch, YARN-1367.002.patch, 
 YARN-1367.prototype.patch


 After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
  Upon receiving the resync response, the NM kills all containers and 
 re-registers with the RM. The NM should be changed to not kill the container 
 and instead inform the RM about all currently running containers including 
 their allocations etc. After the re-register, the NM should send all pending 
 container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048308#comment-14048308
 ] 

Mayank Bansal commented on YARN-2022:
-

Thanks [~sunilg] for the patch.

{code}
public void setAMContainer(boolean isAMContainer) {
  this.isAMContainer = isAMContainer;
  }
{code}

There should be write lock to it as well

  

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover

2014-06-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048333#comment-14048333
 ] 

Vinod Kumar Vavilapalli commented on YARN-2001:
---

+1 for the general idea. I suppose you will implement the node-threshold 
separately?

There are a lot of reasons why it makes sense for scheduler to pause for a 
while. Mind adding some of them here and to the config documentation? 
Insufficient state etc.. Are there more issues?

It'd be great to add some tests too.

 Threshold for RM to accept requests from AM after failover
 --

 Key: YARN-2001
 URL: https://issues.apache.org/jira/browse/YARN-2001
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2001.1.patch


 After failover, RM may require a certain threshold to determine whether it’s 
 safe to make scheduling decisions and start accepting new container requests 
 from AMs. The threshold could be a certain amount of nodes. i.e. RM waits 
 until a certain amount of nodes joining before accepting new container 
 requests.  Or it could simply be a timeout, only after the timeout RM accepts 
 new requests. 
 NMs joined after the threshold can be treated as new NMs and instructed to 
 kill all its containers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048344#comment-14048344
 ] 

Vinod Kumar Vavilapalli commented on YARN-2022:
---

The latest patch looks much better to me, save for Mayank's comment above. +1 
otherwise.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1039) Add parameter for YARN resource requests to indicate long lived

2014-06-30 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048345#comment-14048345
 ] 

Carlo Curino commented on YARN-1039:


Hi Guys, I am just tuning in now... (apologies if I am misinterpreting the 
conversation), but it seems that some of the proposed changes resemble what we 
were proposing for the reservation YARN-1051 work. In the sub-task YARN-1708 we 
propose and extension of ResourceRequest that expresses the duration (or 
leaseDuration if you prefer) for which resources will be reserved... The same 
concept could be used here as a hint from the user on for how long I expect to 
hold onto the resources. What I am suggesting is that having a time 
associated with a ResourceRequest could serve both purposes, and be a generally 
useful hint to the RM.

Thoughts?

 Add parameter for YARN resource requests to indicate long lived
 -

 Key: YARN-1039
 URL: https://issues.apache.org/jira/browse/YARN-1039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.1-beta
Reporter: Steve Loughran
Assignee: Craig Welch
 Attachments: YARN-1039.1.patch, YARN-1039.2.patch, YARN-1039.3.patch


 A container request could support a new parameter long-lived. This could be 
 used by a scheduler that would know not to host the service on a transient 
 (cloud: spot priced) node.
 Schedulers could also decide whether or not to allocate multiple long-lived 
 containers on the same node



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048352#comment-14048352
 ] 

Wangda Tan commented on YARN-2022:
--

Just go through main logic and tests, besides Mayank's comment, LGTM, +1.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, YARN-2022.6.patch, 
 YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2238) filtering on UI sticks even if I move away from the page

2014-06-30 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2238:
--

Attachment: filtered.png

Screenshot of such a filtered page.

Note that it clearly says it's showing 2 entries filtered from 601 entries, but 
all the search fields are blank.

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
 Attachments: filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2238) filtering on UI sticks even if I move away from the page

2014-06-30 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2238:
-

 Summary: filtering on UI sticks even if I move away from the page
 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
 Attachments: filtered.png

The main data table in many web pages (RM, AM, etc.) seems to show an 
unexpected filtering behavior.

If I filter the table by typing something in the key or value field (or I 
suspect any search field), the data table gets filtered. The example I used is 
the job configuration page for a MR job. That is expected.

However, when I move away from that page and visit any other web page of the 
same type (e.g. a job configuration page), the page is rendered with the 
filtering! That is unexpected.

What's even stranger is that it does not render the filtering term. As a 
result, I have a page that's mysteriously filtered but doesn't tell me what 
it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page

2014-06-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048365#comment-14048365
 ] 

Sangjin Lee commented on YARN-2238:
---

Could this be related to YARN-237?

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
 Attachments: filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2238) filtering on UI sticks even if I move away from the page

2014-06-30 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048379#comment-14048379
 ] 

Sangjin Lee commented on YARN-2238:
---

A minor clarification: if the filtering was done via the top right search 
field, the search term is shown when you go to another page. So at least you 
know the filtering is being done by that search term.

On the other hand, if the filtering was done via the bottom search fields (key, 
value, or source chain), the search term is NOT shown when you go to another 
page (but the filtering is still done).

 filtering on UI sticks even if I move away from the page
 

 Key: YARN-2238
 URL: https://issues.apache.org/jira/browse/YARN-2238
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.4.0
Reporter: Sangjin Lee
 Attachments: filtered.png


 The main data table in many web pages (RM, AM, etc.) seems to show an 
 unexpected filtering behavior.
 If I filter the table by typing something in the key or value field (or I 
 suspect any search field), the data table gets filtered. The example I used 
 is the job configuration page for a MR job. That is expected.
 However, when I move away from that page and visit any other web page of the 
 same type (e.g. a job configuration page), the page is rendered with the 
 filtering! That is unexpected.
 What's even stranger is that it does not render the filtering term. As a 
 result, I have a page that's mysteriously filtered but doesn't tell me what 
 it's filtering on.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-941) RM Should have a way to update the tokens it has for a running application

2014-06-30 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048384#comment-14048384
 ] 

Vinod Kumar Vavilapalli commented on YARN-941:
--

[~bcwalrus],

Your point about us remembering the AMRMTokens is right, I stand corrected.

Responses inline

bq. The attacker gains access to the persistence store, where the RM stores its 
id, password map. In this case, all bets are off. Neither solution is more 
secure than the other.
That is correct. This problem is handled by securing the state-store, which we 
require today.

bq. The attacker snoops an insecure RPC channel and reads valid tokens from the 
network. The proper solution is to turn on RPC privacy. The token replacement 
patch does not offer any real protection. On the contrary, it may give people a 
false sense of security, which would be worse.
RPC privacy is a very expensive solution for AM-RM communication. First, it 
needs setup so AM/RM have access to key infrastructure - having this burden on 
all applications is not reasonable. This is compounded by the fact that we use 
AMRMTokens in non-secure mode too. Second, AM - RM communication is a very 
chatty protocol, it's likely the overhead is huge..

You are right in that token renewal+replacement is a mitigative solution but 
the plus is that it can do so without that much cost.

bq. The attacker mounts a cryptographic attack, or somehow manages to guess a 
valid id, password pair. Token replacement is better because it limits the 
exposure. But this attack is very unlikely. And we can counter that by using a 
stronger hash function.
Unfortunately with long running services (the focus of this JIRA), this attack 
and its success is not as unlikely. This is the very reason why we roll 
master-keys every so often in the first place. For long-running services, 
AMRMTokens play a very similar role of master-keys between Hadoop daemons.

Overall, I think token replacement is not as complex as you may think it is. 
The evidence to that is the redistribution of NMTokens that we *already* do. 
And as I see Xuan already has a patch.

Our client, for all our efforts, already is fat. The way we handled the burden 
of NMTokens etc is by having a smarter client that takes care of the 
replacement for users. We can do the same for AMRMTokens too..

 RM Should have a way to update the tokens it has for a running application
 --

 Key: YARN-941
 URL: https://issues.apache.org/jira/browse/YARN-941
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Robert Joseph Evans
Assignee: Xuan Gong
 Attachments: YARN-941.preview.2.patch, YARN-941.preview.3.patch, 
 YARN-941.preview.4.patch, YARN-941.preview.patch


 When an application is submitted to the RM it includes with it a set of 
 tokens that the RM will renew on behalf of the application, that will be 
 passed to the AM when the application is launched, and will be used when 
 launching the application to access HDFS to download files on behalf of the 
 application.
 For long lived applications/services these tokens can expire, and then the 
 tokens that the AM has will be invalid, and the tokens that the RM had will 
 also not work to launch a new AM.
 We need to provide an API that will allow the RM to replace the current 
 tokens for this application with a new set.  To avoid any real race issues, I 
 think this API should be something that the AM calls, so that the client can 
 connect to the AM with a new set of tokens it got using kerberos, then the AM 
 can inform the RM of the new set of tokens and quickly update its tokens 
 internally to use these new ones.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI

2014-06-30 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2181:
-

Attachment: YARN-2181.patch

Merge RMAppAttemptPreemptionMetrics into RMAppPreemptionMetrics to make 
attempt's preemption metrics is always consistent with app's preemption metrics.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2236) Shared Cache uploader service on the Node Manager

2014-06-30 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2236:
---

Attachment: YARN-2236-trunk-v2.patch

Attached is a v2 patch (to fix a small issue introduced during rebase).

 Shared Cache uploader service on the Node Manager
 -

 Key: YARN-2236
 URL: https://issues.apache.org/jira/browse/YARN-2236
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2236-trunk-v1.patch, YARN-2236-trunk-v2.patch


 Implement the shared cache uploader service on the node manager.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048438#comment-14048438
 ] 

Hadoop QA commented on YARN-2181:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653284/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4151//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4151//console

This message is automatically generated.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-06-30 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048444#comment-14048444
 ] 

Chris Trezzo commented on YARN-1492:


All patches for the shared cache have now been posted. Please let me know if I 
can do anything else to facilitate code reviews and make the review process as 
easy as possible. Thanks in advance!

 truly shared cache for jars (jobjar/libjar)
 ---

 Key: YARN-1492
 URL: https://issues.apache.org/jira/browse/YARN-1492
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Chris Trezzo
 Attachments: YARN-1492-all-trunk-v1.patch, shared_cache_design.pdf, 
 shared_cache_design_v2.pdf, shared_cache_design_v3.pdf, 
 shared_cache_design_v4.pdf, shared_cache_design_v5.pdf


 Currently there is the distributed cache that enables you to cache jars and 
 files so that attempts from the same job can reuse them. However, sharing is 
 limited with the distributed cache because it is normally on a per-job basis. 
 On a large cluster, sometimes copying of jobjars and libjars becomes so 
 prevalent that it consumes a large portion of the network bandwidth, not to 
 speak of defeating the purpose of bringing compute to where data is. This 
 is wasteful because in most cases code doesn't change much across many jobs.
 I'd like to propose and discuss feasibility of introducing a truly shared 
 cache so that multiple jobs from multiple users can share and cache jars. 
 This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2229) Making ContainerId long type

2014-06-30 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2229:
-

Attachment: YARN-2229-wip.01.patch

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229-wip.01.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2229) Making ContainerId long type

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048495#comment-14048495
 ] 

Hadoop QA commented on YARN-2229:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12653313/YARN-2229-wip.01.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4152//console

This message is automatically generated.

 Making ContainerId long type
 

 Key: YARN-2229
 URL: https://issues.apache.org/jira/browse/YARN-2229
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2229-wip.01.patch


 On YARN-2052, we changed containerId format: upper 10 bits are for epoch, 
 lower 22 bits are for sequence number of Ids. This is for preserving 
 semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, 
 {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and 
 {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM 
 restarts 1024 times.
 To avoid the problem, its better to make containerId long. We need to define 
 the new format of container Id with preserving backward compatibility on this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2181) Add preemption info to RM Web UI

2014-06-30 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2181:
-

Attachment: YARN-2181.patch

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, 
 queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.10.patch

Thank you [~mayank_bansal] , [~vinodkv] and [~leftnoteasy] for the comments.

I have updated the patch. Please review.

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: (was: YARN-2022.10.patch)

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-30 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-2022:
--

Attachment: YARN-2022.10.patch

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.10.patch, 
 YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, YARN-2022.5.patch, 
 YARN-2022.6.patch, YARN-2022.7.patch, YARN-2022.8.patch, YARN-2022.9.patch, 
 Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.9.patch

Thank for reviewing patch.. I updated patch as per comments. Please review 
update patch

bq. “blacklistRemovals.addAll(blacklistToRemove);”, we don't need to add this 
in isResyncCommand check? as RM after restart will just forget all previously 
blacklisted nodes.
DONE, 

bq. below code needs synchronize ? 
Yes

bq. “isApplicationMasterRegistered = false;” not needed in allocate and 
unregisterApplicationMaster.
DONE, removed this variale itself since not used.

bq. we may check pendingRelease isEmpty as well to avoid unnecessary loops
DONE

bq. Instead of adding a new core-site.xml file, we can just set the config in 
the test code conf object.
core-site.xml has to be there since SecurityUtil.java loads configurations 
during class loading.It can not be passed through conf object.!!

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048547#comment-14048547
 ] 

Hadoop QA commented on YARN-2181:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653319/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4153//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4153//console

This message is automatically generated.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, application page.png, 
 queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: (was: YARN-1366.9.patch)

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.patch, YARN-1366.prototype.patch, 
 YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.9.patch

Update patch for test cases correction

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.6.patch, YARN-1366.7.patch, 
 YARN-1366.8.patch, YARN-1366.9.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)