date:20140627


[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045616#comment-14045616
 ] 

Hadoop QA commented on YARN-2201:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12652758/apache-yarn-2201.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4110//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4110//console

This message is automatically generated.

 TestRMWebServicesAppsModification dependent on yarn-default.xml
 ---

 Key: YARN-2201
 URL: https://issues.apache.org/jira/browse/YARN-2201
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ray Chiang
Assignee: Varun Vasudev
  Labels: test
 Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
 apache-yarn-2201.2.patch, apache-yarn-2201.3.patch


 TestRMWebServicesAppsModification.java has some errors that are 
 yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
 seeing the following errors:
 1) Changing yarn.resourcemanager.scheduler.class from 
 capacity.CapacityScheduler to fair.FairScheduler gives the error:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 3.22 sec   FAILURE!
 java.lang.AssertionError: expected:Forbidden but was:Accepted
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
 2) Changing yarn.acl.enable from false to true results in the following 
 errors:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.986 sec   FAILURE!
 java.lang.AssertionError: expected:Accepted but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
 testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.258 sec   FAILURE!
 java.lang.AssertionError: expected:Bad Request but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at

[jira] [Updated] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().

2014-06-27 Thread Ravi Prakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated YARN-2163:
---

Target Version/s: 2.5.0

 WebUI: Order of AppId in apps table should be consistent with 
 ApplicationId.compareTo().
 

 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Priority: Minor
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2163.patch, apps page.png


 Currently, AppId is treated as numeric, so the sort result in applications 
 table is sorted by int typed id only (not included cluster timestamp), see 
 attached screenshot. Order of AppId in web page should be consistent with 
 ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2181) Add preemption info to RM Web UI


[ 
https://issues.apache.org/jira/browse/YARN-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045625#comment-14045625
 ] 

Hadoop QA commented on YARN-2181:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652759/YARN-2181.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4111//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4111//console

This message is automatically generated.

 Add preemption info to RM Web UI
 

 Key: YARN-2181
 URL: https://issues.apache.org/jira/browse/YARN-2181
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, YARN-2181.patch, 
 YARN-2181.patch, YARN-2181.patch, application page.png, queue page.png


 We need add preemption info to RM web page to make administrator/user get 
 more understanding about preemption happened on app/queue, etc. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken


 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.9.patch

Let me kick Jenkins CI again to clarify the reason of test failure.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2163) WebUI: Order of AppId in apps table should be consistent with ApplicationId.compareTo().


[ 
https://issues.apache.org/jira/browse/YARN-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045668#comment-14045668
 ] 

Wangda Tan commented on YARN-2163:
--

Thanks [~raviprak] for review and commit!

 WebUI: Order of AppId in apps table should be consistent with 
 ApplicationId.compareTo().
 

 Key: YARN-2163
 URL: https://issues.apache.org/jira/browse/YARN-2163
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Minor
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2163.patch, apps page.png


 Currently, AppId is treated as numeric, so the sort result in applications 
 table is sorted by int typed id only (not included cluster timestamp), see 
 attached screenshot. Order of AppId in web page should be consistent with 
 ApplicationId.compareTo().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue


[ 
https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045679#comment-14045679
 ] 

Peng Zhang commented on YARN-1810:
--

I updated $('#apps').dataTable().fnFilter(q, 3, true); field number from 3 to 
4, click “default queue bar, applications will not disappear.

But I found this fnFilter query will be maintained to Application page. As we 
have multiple queues, If I  click one of them in scheduler page, and go to 
application page, only applications of clicked queue will show, other 
applications are filtered. Cause no filter query shows on page, so this may 
cause confusions.



 YARN RM Webapp Application page Issue
 -

 Key: YARN-1810
 URL: https://issues.apache.org/jira/browse/YARN-1810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.3.0
Reporter: Ethan Setnik
 Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot 
 2014-03-11 at 1.40.12 PM.png


 When browsing the ResourceManager's web interface I am presented with the 
 attached screenshot.
 I can't understand why it does not show the applications, even though there 
 is no search text.  The application counts show the correct values relative 
 to the submissions, successes, and failures.
 Also see the text in the screenshot:
 Showing 0 to 0 of 0 entries (filtered from 19 total entries)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045681#comment-14045681
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652767/YARN-2052.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4112//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4112//console

This message is automatically generated.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue


[ 
https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045697#comment-14045697
 ] 

Wangda Tan commented on YARN-1810:
--

I've uploaded a simple fix to YARN-2104, please kindly review!
[~peng.zhang], good suggestion, could you create a JIRA to track it? 

 YARN RM Webapp Application page Issue
 -

 Key: YARN-1810
 URL: https://issues.apache.org/jira/browse/YARN-1810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.3.0
Reporter: Ethan Setnik
 Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot 
 2014-03-11 at 1.40.12 PM.png


 When browsing the ResourceManager's web interface I am presented with the 
 attached screenshot.
 I can't understand why it does not show the applications, even though there 
 is no search text.  The application counts show the correct values relative 
 to the submissions, successes, and failures.
 Also see the text in the screenshot:
 Showing 0 to 0 of 0 entries (filtered from 19 total entries)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045708#comment-14045708
 ] 

Peng Zhang commented on YARN-2104:
--

Looks good to me.

 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status

[
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045713#comment-14045713
]

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12652787/trust.patch
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4114//console

This message is automatically generated.

Add one service to check the nodes' TRUST status
-

Key: YARN-2142
URL: https://issues.apache.org/jira/browse/YARN-2142
Project: Hadoop YARN
Issue Type: New Feature
Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
Environment: OS:Ubuntu 13.04;
JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
Labels: patch
Fix For: 2.2.0

Attachments: test.patch, trust.patch, trust.patch, trust.patch,
trust.patch

Original Estimate: 1m
Remaining Estimate: 1m

Because of critical computing environment ,we must test every node's TRUST
status in the cluster (We can get the TRUST status by the API of OAT
sever),So I add this feature into hadoop's schedule .
By the TRUST check service ,node can get the TRUST status of itself,
then through the heartbeat ,send the TRUST status to resource manager for
scheduling.
In the scheduling step,if the node's TRUST status is 'false', it will be
abandoned until it's TRUST status turn to 'true'.
***The logic of this feature is similar to node's health checkservice.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread anders (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust.patch

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2221) WebUI: RM scheduler page's queue filter status will affect appllication page

Peng Zhang created YARN-2221:


 Summary: WebUI: RM scheduler page's queue filter status will 
affect appllication page
 Key: YARN-2221
 URL: https://issues.apache.org/jira/browse/YARN-2221
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Peng Zhang
Priority: Minor


Apps queue filter added by clicking queue bar in scheduler page will affect 
display of applications page.
No filter query is shown on applications page, this makes confusions.
Also we cannot reset the filter query on application page, and we must come 
back to scheduler page, click root queue to reset. 

Reproduce steps: 
{code}
1) Configure two queues under root( A  B)
2) Run some apps using queue A and B respectively
3) Click “A” queue in scheduler page
4) Click “Applications”, only apps of queue A show
5) Click “B” queue in scheduler page
6) Click “Applications”, only apps of queue B show
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1810) YARN RM Webapp Application page Issue


[ 
https://issues.apache.org/jira/browse/YARN-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045717#comment-14045717
 ] 

Peng Zhang commented on YARN-1810:
--

OK, I created JIRA: https://issues.apache.org/jira/browse/YARN-2221

 YARN RM Webapp Application page Issue
 -

 Key: YARN-1810
 URL: https://issues.apache.org/jira/browse/YARN-1810
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.3.0
Reporter: Ethan Setnik
 Attachments: Screen Shot 2014-03-10 at 3.59.54 PM.png, Screen Shot 
 2014-03-11 at 1.40.12 PM.png


 When browsing the ResourceManager's web interface I am presented with the 
 attached screenshot.
 I can't understand why it does not show the applications, even though there 
 is no search text.  The application counts show the correct values relative 
 to the submissions, successes, and failures.
 Also see the text in the screenshot:
 Showing 0 to 0 of 0 entries (filtered from 19 total entries)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045720#comment-14045720
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

The test failure of TestRMApplicationHistoryWriter is filed as YARN-2216. This 
failure not related to this JIRA.

[~jianhe] [~vinodkv], can you take a look, please?

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread anders (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: (was: trust.patch)

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2142) Add one service to check the nodes' TRUST status

2014-06-27 Thread anders (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anders updated YARN-2142:
-

Attachment: trust001.patch

modify the xml

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-27 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045726#comment-14045726
 ] 

Varun Vasudev commented on YARN-2201:
-

Test failure is unrelated.

 TestRMWebServicesAppsModification dependent on yarn-default.xml
 ---

 Key: YARN-2201
 URL: https://issues.apache.org/jira/browse/YARN-2201
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ray Chiang
Assignee: Varun Vasudev
  Labels: test
 Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
 apache-yarn-2201.2.patch, apache-yarn-2201.3.patch


 TestRMWebServicesAppsModification.java has some errors that are 
 yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
 seeing the following errors:
 1) Changing yarn.resourcemanager.scheduler.class from 
 capacity.CapacityScheduler to fair.FairScheduler gives the error:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 3.22 sec   FAILURE!
 java.lang.AssertionError: expected:Forbidden but was:Accepted
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
 2) Changing yarn.acl.enable from false to true results in the following 
 errors:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.986 sec   FAILURE!
 java.lang.AssertionError: expected:Accepted but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
 testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.258 sec   FAILURE!
 java.lang.AssertionError: expected:Bad Request but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
 testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.263 sec   FAILURE!
 java.lang.AssertionError: expected:Forbidden but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
 testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 0.214 sec   FAILURE!
 java.lang.AssertionError: expected:Not Found but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482)
 I'm opening this JIRA as a discussion for the best way to fix this.  I've got 
 a few ideas, but I would like to get some feedback about

[jira] [Commented] (YARN-570) Time strings are formated in different timezone


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045740#comment-14045740
 ] 

Hadoop QA commented on YARN-570:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644756/YARN-570.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4115//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4115//console

This message is automatically generated.

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045742#comment-14045742
 ] 

Hadoop QA commented on YARN-2104:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652783/YARN-2104.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4113//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4113//console

This message is automatically generated.

 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2142) Add one service to check the nodes' TRUST status


[ 
https://issues.apache.org/jira/browse/YARN-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045745#comment-14045745
 ] 

Hadoop QA commented on YARN-2142:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652788/trust001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

  {color:red}-1 javac{color}.  The applied patch generated 1266 javac 
compiler warnings (more than the trunk's current 1258 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-auth.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4116//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4116//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4116//console

This message is automatically generated.

 Add one service to check the nodes' TRUST status 
 -

 Key: YARN-2142
 URL: https://issues.apache.org/jira/browse/YARN-2142
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager, resourcemanager, scheduler
Affects Versions: 2.2.0
 Environment: OS:Ubuntu 13.04; 
 JAVA:OpenJDK 7u51-2.4.4-0
Reporter: anders
Priority: Minor
  Labels: patch
 Fix For: 2.2.0

 Attachments: test.patch, trust.patch, trust.patch, trust.patch, 
 trust001.patch

   Original Estimate: 1m
  Remaining Estimate: 1m

 Because of critical computing environment ,we must test every node's TRUST 
 status in the cluster (We can get the TRUST status by the API of OAT 
 sever),So I add this feature into hadoop's schedule .
 By the TRUST check service ,node can get the TRUST status of itself,
 then through the heartbeat ,send the TRUST status to resource manager for 
 scheduling.
 In the scheduling step,if the node's TRUST status is 'false', it will be 
 abandoned until it's TRUST status turn to 'true'.
 ***The logic of this feature is similar to node's health checkservice.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


 [ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2130:
-

Attachment: YARN-2130.6.patch

Rebased on trunk.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-570) Time strings are formated in different timezone


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045772#comment-14045772
 ] 

Tsuyoshi OZAWA commented on YARN-570:
-

[~qwertymaniac], Thank you for the review. If we'll make time format same 
completely, we need to change lots parts to use same format function. As a 
temporary fix that addresses this issue at first, Akira's patch looks good to 
me. What do you think?

I think the timezone difference confuses users frequently, so we should fix it 
in the next release(2.5.0).

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045795#comment-14045795
 ] 

Hadoop QA commented on YARN-2130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652792/YARN-2130.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4117//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4117//console

This message is automatically generated.

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2130) Cleanup: Adding getRMAppManager, getQueueACLsManager, getApplicationACLsManager to RMContext


[ 
https://issues.apache.org/jira/browse/YARN-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045799#comment-14045799
 ] 

Tsuyoshi OZAWA commented on YARN-2130:
--

The test failure of TestRMApplicationHistoryWriter is not related and the issue 
is filed as YARN-2216. 

 Cleanup: Adding getRMAppManager, getQueueACLsManager, 
 getApplicationACLsManager to RMContext
 

 Key: YARN-2130
 URL: https://issues.apache.org/jira/browse/YARN-2130
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2130.1.patch, YARN-2130.2.patch, YARN-2130.3.patch, 
 YARN-2130.4.patch, YARN-2130.5.patch, YARN-2130.6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1514) Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA


[ 
https://issues.apache.org/jira/browse/YARN-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045926#comment-14045926
 ] 

Tsuyoshi OZAWA commented on YARN-1514:
--

[~kkambatl], could you take a look at this JIRA? This per tools is useful and I 
hope to include this feature in 2.5.0 release.

 Utility to benchmark ZKRMStateStore#loadState for ResourceManager-HA
 

 Key: YARN-1514
 URL: https://issues.apache.org/jira/browse/YARN-1514
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Fix For: 2.5.0

 Attachments: YARN-1514.1.patch, YARN-1514.2.patch, 
 YARN-1514.wip-2.patch, YARN-1514.wip.patch


 ZKRMStateStore is very sensitive to ZNode-related operations as discussed in 
 YARN-1307, YARN-1378 and so on. Especially, ZKRMStateStore#loadState is 
 called when RM-HA cluster does failover. Therefore, its execution time 
 impacts failover time of RM-HA.
 We need utility to benchmark time execution time of ZKRMStateStore#loadStore 
 as development tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-570) Time strings are formated in different timezone


[ 
https://issues.apache.org/jira/browse/YARN-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045945#comment-14045945
 ] 

Tsuyoshi OZAWA commented on YARN-570:
-

{quote}
The format of JavaScript Date.toLocaleString() varies by the browser. 
{quote}

One alternative to make format same is to change {{renderHadoopDate}} to return 
same format as {{yarn.util.Times#format()}} does instead of using 
{{Date#toLocaleString}}. [~ajisakaa], [~qwertymaniac], what do you think?

 Time strings are formated in different timezone
 ---

 Key: YARN-570
 URL: https://issues.apache.org/jira/browse/YARN-570
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Affects Versions: 2.2.0
Reporter: Peng Zhang
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5141.patch, YARN-570.2.patch


 Time strings on different page are displayed in different timezone.
 If it is rendered by renderHadoopDate() in yarn.dt.plugins.js, it appears as 
 Wed, 10 Apr 2013 08:29:56 GMT
 If it is formatted by format() in yarn.util.Times, it appears as 10-Apr-2013 
 16:29:56
 Same value, but different timezone.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1341) Recover NMTokens upon nodemanager restart

2014-06-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045959#comment-14045959
 ] 

Jason Lowe commented on YARN-1341:
--

Agree it's not ideal to discuss handling state store errors for all NM 
components in this JIRA.  In general I'd prefer to discuss and address each 
case with the corresponding JIRA, e.g.: application state store errors 
discussed and addressed in YARN-1354, container state store errors in 
YARN-1337, etc.  If we feel there's significant utility to committing a JIRA 
before all the issues are addressed then we can file one or more followup JIRAs 
to track those outstanding issues.  That's the normal process we follow with 
other features/fixes as well.  

So if we follow that process then we're back to the discussion about RM master 
keys not being able to be stored in the state store.  The choices we've 
discussed are:

1) Log an error, update the master key in memory, and continue
2) Log an error, _not_ update the master key in memory, and continue
3) Log an error and tear down the NM

I'd prefer 1) since that is the option that preserves the most work in all 
scenarios I can think of, and I don't know of a scenario where 2) would handle 
it better.  However I could be convinced given the right scenario.  I'd really 
rather avoid 3) since that seems like a severe way to handle the error and 
guarantees work is lost.

Oh there is one more handling scenario we briefly discussed where we flag the 
NM as undesirable.  When that occurs we don't shoot the containers that are 
running, but we avoid adding new containers since the node is having issues 
(i.e.: a drain-decommission).  I feel that would be a separate JIRA since it 
needs YARN-914, and we'd still need to decide how to handle the error until the 
decommission is complete (i.e.: choice 1 or 2 above).

 Recover NMTokens upon nodemanager restart
 -

 Key: YARN-1341
 URL: https://issues.apache.org/jira/browse/YARN-1341
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-1341.patch, YARN-1341v2.patch, YARN-1341v3.patch, 
 YARN-1341v4-and-YARN-1987.patch, YARN-1341v5.patch, YARN-1341v6.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Sunil G (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.5.patch

Hi [~vinodkv] [~leftnoteasy]
Please find initial patch.

Some information about the patch.
* While recovering ResourceRequest, if such an entry is found in Scheduling
Info then the number of container is incremented. Else added as a new entry.
* Adding a new OffRackRequest also while recovering, if the stored request is
not OffRack.
* AM would have asked for NodeLocal in another Hosts, which may not be able to
recover.

Kindly review.

Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task
timeout for 30mins
--

Key: YARN-1408
URL: https://issues.apache.org/jira/browse/YARN-1408
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch,
Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch

Capacity preemption is enabled as follows.
* yarn.resourcemanager.scheduler.monitor.enable= true ,
*
yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
Queue = a,b
Capacity of Queue A = 80%
Capacity of Queue B = 20%
Step 1: Assign a big jobA on queue a which uses full cluster capacity
Step 2: Submitted a jobB to queue b which would use less than 20% of cluster
capacity
JobA task which uses queue b capcity is been preempted and killed.
This caused below problem:
1. New Container has got allocated for jobA in Queue A as per node update
from an NM.
2. This container has been preempted immediately as per preemption.
Here ACQUIRED at KILLED Invalid State exception came when the next AM
heartbeat reached RM.
ERROR
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event:
ACQUIRED at KILLED
This also caused the Task to go for a timeout for 30minutes as this Container
was already killed by preemption.
attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

[
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045970#comment-14045970
]

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12652828/Yarn-1408.5.patch
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4119//console

This message is automatically generated.

Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task
timeout for 30mins
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2178) TestApplicationMasterService sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/YARN-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045974#comment-14045974
 ] 

Tsuyoshi OZAWA commented on YARN-2178:
--

[~mitdesai] [~ted_yu] FYI: I use this bash script to reproduce timing bugs: 
https://github.com/oza/failchecker

{code}
$ ./failchecker TestApplicationMasterService
{code}

This scripts run specified tests iteratively until it fails.

 TestApplicationMasterService sometimes fails in trunk
 -

 Key: YARN-2178
 URL: https://issues.apache.org/jira/browse/YARN-2178
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
  Labels: test

 From https://builds.apache.org/job/Hadoop-Yarn-trunk/587/ :
 {code}
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
 Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 55.763 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
 testInvalidContainerReleaseRequest(org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService)
   Time elapsed: 41.336 sec   FAILURE!
 java.lang.AssertionError: AppAttempt state is not correct (timedout) 
 expected:ALLOCATED but was:SCHEDULED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:401)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService.testInvalidContainerReleaseRequest(TestApplicationMasterService.java:143)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect


[ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045991#comment-14045991
 ] 

Hadoop QA commented on YARN-2034:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644210/YARN-2034.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4118//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4118//console

This message is automatically generated.

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2034.patch


 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2034:
--

Labels: documentation  (was: )

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
  Labels: documentation
 Attachments: YARN-2034.patch


 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect

2014-06-27 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2034:
--

Attachment: YARN-2034.patch

resubmit to trigger HadoopQA

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
  Labels: documentation
 Attachments: YARN-2034.patch, YARN-2034.patch


 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2222) Helper scirpt: looping tests until it fails

Tsuyoshi OZAWA created YARN-:


 Summary: Helper scirpt: looping tests until it fails
 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA


Some tests can fail intermittently because of timing bugs. To reproduce the 
test failure, it's useful to add script which launches specified test until it 
fails.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect


[ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046019#comment-14046019
 ] 

Hadoop QA commented on YARN-2034:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652832/YARN-2034.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4120//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4120//console

This message is automatically generated.

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
  Labels: documentation
 Attachments: YARN-2034.patch, YARN-2034.patch


 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2034) Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect


[ 
https://issues.apache.org/jira/browse/YARN-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046028#comment-14046028
 ] 

Tsuyoshi OZAWA commented on YARN-2034:
--

+1(non-binding)

 Description for yarn.nodemanager.localizer.cache.target-size-mb is incorrect
 

 Key: YARN-2034
 URL: https://issues.apache.org/jira/browse/YARN-2034
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.10, 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
  Labels: documentation
 Attachments: YARN-2034.patch, YARN-2034.patch


 The description in yarn-default.xml for 
 yarn.nodemanager.localizer.cache.target-size-mb says that it is a setting per 
 local directory, but according to the code it's a setting for the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-896) Roll up for long-lived services in YARN

2014-06-27 Thread john lilley (JIRA)

[
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046065#comment-14046065
]

john lilley commented on YARN-896:
--

Grreetings! Arun pointed me to this JIRA to see if this could potentially meet
our needs. We are an ISV that currently ships a data-quality/integration suite
running as a native YARN application. We are finding several use cases that
would benefit from being able to manage a per-node persistent service.
MapReduce has its “shuffle auxiliary service”, but it isn’t straightforward to
add auxiliary services because they cannot be loaded from HDFS, so we’d have to
manage the distribution of JARs across nodes (please tell me if I’m wrong
here…).

This seems to be addressing a lot of the issues around persistent services, and
frankly I'm out of my depth in this discussion. But if you all can help me
understand if this might help our situation, I'd be happy to have our team put
shoulder to the wheel and help advance the development. Please comment our
contemplated use case and help me understand if this is the right place to be.

Our software doesn't use MapReduce. It is a pure YARN application that is
basically a peer to MapReduce. There are a lot of reasons for this decision,
but the main one is that we have a large code base that already executes data
transformations in a single-server environment, and we wanted to produce a
product without rewriting huge swaths of code. Given that, our software takes
care of many things usually delegated to MapReduce, including distributed
sort/partition (i.e. the shuffle). However, MapReduce has a special place in
the ecosystem, in that it creates an auxiliary service to handle the
distribution of shuffle data to reducers. It doesn't look like third-party
apps have an easy time installing aux services. The JARs for any such service
must be in Hadoop's classpath on all nodes at startup, creating both a
management issue and a trust/security issue. Currently our software places
temporary data into HDFS for this purpose, but we've found that HDFS has a huge
overhead in terms of performance and file handles, even at low replication. We
desire to replace the use of HDFS with a lighter-weight service to manage temp
files and distribute their data.

Roll up for long-lived services in YARN
---

Key: YARN-896
URL: https://issues.apache.org/jira/browse/YARN-896
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Robert Joseph Evans

YARN is intended to be general purpose, but it is missing some features to be
able to truly support long lived applications and long lived containers.
This ticket is intended to
# discuss what is needed to support long lived processes
# track the resulting JIRA.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2201) TestRMWebServicesAppsModification dependent on yarn-default.xml

2014-06-27 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046109#comment-14046109
 ] 

Ray Chiang commented on YARN-2201:
--

+1 for the latest patch.  The tests are now independent of changes in 
yarn-default.xml.

 TestRMWebServicesAppsModification dependent on yarn-default.xml
 ---

 Key: YARN-2201
 URL: https://issues.apache.org/jira/browse/YARN-2201
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ray Chiang
Assignee: Varun Vasudev
  Labels: test
 Attachments: apache-yarn-2201.0.patch, apache-yarn-2201.1.patch, 
 apache-yarn-2201.2.patch, apache-yarn-2201.3.patch


 TestRMWebServicesAppsModification.java has some errors that are 
 yarn-default.xml dependent.  By changing yarn-default.xml properties, I'm 
 seeing the following errors:
 1) Changing yarn.resourcemanager.scheduler.class from 
 capacity.CapacityScheduler to fair.FairScheduler gives the error:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 Tests run: 10, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 79.047 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKillUnauthorized[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 3.22 sec   FAILURE!
 java.lang.AssertionError: expected:Forbidden but was:Accepted
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
 2) Changing yarn.acl.enable from false to true results in the following 
 errors:
 Running 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 Tests run: 10, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 49.044 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
 testSingleAppKill[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.986 sec   FAILURE!
 java.lang.AssertionError: expected:Accepted but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKill(TestRMWebServicesAppsModification.java:287)
 testSingleAppKillInvalidState[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.258 sec   FAILURE!
 java.lang.AssertionError: expected:Bad Request but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidState(TestRMWebServicesAppsModification.java:369)
 testSingleAppKillUnauthorized[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 2.263 sec   FAILURE!
 java.lang.AssertionError: expected:Forbidden but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillUnauthorized(TestRMWebServicesAppsModification.java:458)
 testSingleAppKillInvalidId[0](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
   Time elapsed: 0.214 sec   FAILURE!
 java.lang.AssertionError: expected:Not Found but was:Unauthorized
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testSingleAppKillInvalidId(TestRMWebServicesAppsModification.java:482)
 I'm opening this JIRA as a discussion for the best way to fix this.  I've got 
 a few

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-06-27 Thread Eric Payne (JIRA)

[
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046113#comment-14046113
]

Eric Payne commented on YARN-415:
-

Test failures for TestRMApplicationHistoryWriter predate this patch.

Capture memory utilization at the app-level for chargeback
--

Key: YARN-415
URL: https://issues.apache.org/jira/browse/YARN-415
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp
Assignee: Andrey Klochkov
Attachments: YARN-415--n10.patch, YARN-415--n2.patch,
YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch,
YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch,
YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt,
YARN-415.201406262136.txt, YARN-415.patch

For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage. To start out, I'd like to
get the memory utilization of an application. The unit should be MB-seconds
or something similar and, from a chargeback perspective, the memory amount
should be the memory reserved for the application, as even if the app didn't
use all that memory, no one else was able to use it.
(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n
* lifetime of container n)
It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
This new metric should be available both through the RM UI and RM Web
Services REST API.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-27 Thread Rohith (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohith updated YARN-1366:
-

Attachment: YARN-1366.5.patch

I updated the patch for following incremental change.
1. Reregister for AmRMClient if unregister throw
ApplicationMasterNotRegisteredException.
2. Unregister will be called only if it is registered.

Please review the updated patch

AM should implement Resync with the ApplicationMasterService instead of
shutting down
-

Key: YARN-1366
URL: https://issues.apache.org/jira/browse/YARN-1366
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch,
YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch,
YARN-1366.prototype.patch, YARN-1366.prototype.patch

The ApplicationMasterService currently sends a resync response to which the
AM responds by shutting down. The AM behavior is expected to change to
calling resyncing with the RM. Resync means resetting the allocate RPC
sequence number to 0 and the AM should send its entire outstanding request to
the RM. Note that if the AM is making its first allocate call to the RM then
things should proceed like normal without needing a resync. The RM will
return all containers that have completed since the RM last synced with the
AM. Some container completions may be reported more than once.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: (was: Yarn-1408.5.patch)

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

2014-06-27 Thread Sunil G (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-1408:
--

Attachment: Yarn-1408.5.patch

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-614:
---

Attachment: YARN-614.11.patch

Added more testcases

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, 
 YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down


[ 
https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046176#comment-14046176
 ] 

Hadoop QA commented on YARN-1366:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652857/YARN-1366.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.api.impl.TestAMRMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4121//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4121//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4121//console

This message is automatically generated.

 AM should implement Resync with the ApplicationMasterService instead of 
 shutting down
 -

 Key: YARN-1366
 URL: https://issues.apache.org/jira/browse/YARN-1366
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Rohith
 Attachments: YARN-1366.1.patch, YARN-1366.2.patch, YARN-1366.3.patch, 
 YARN-1366.4.patch, YARN-1366.5.patch, YARN-1366.patch, 
 YARN-1366.prototype.patch, YARN-1366.prototype.patch


 The ApplicationMasterService currently sends a resync response to which the 
 AM responds by shutting down. The AM behavior is expected to change to 
 calling resyncing with the RM. Resync means resetting the allocate RPC 
 sequence number to 0 and the AM should send its entire outstanding request to 
 the RM. Note that if the AM is making its first allocate call to the RM then 
 things should proceed like normal without needing a resync. The RM will 
 return all containers that have completed since the RM last synced with the 
 AM. Some container completions may be reported more than once.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins


[ 
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046199#comment-14046199
 ] 

Hadoop QA commented on YARN-1408:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652860/Yarn-1408.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4122//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4122//console

This message is automatically generated.

 Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task 
 timeout for 30mins
 --

 Key: YARN-1408
 URL: https://issues.apache.org/jira/browse/YARN-1408
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: Yarn-1408.1.patch, Yarn-1408.2.patch, Yarn-1408.3.patch, 
 Yarn-1408.4.patch, Yarn-1408.5.patch, Yarn-1408.patch


 Capacity preemption is enabled as follows.
  *  yarn.resourcemanager.scheduler.monitor.enable= true ,
  *  
 yarn.resourcemanager.scheduler.monitor.policies=org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
 Queue = a,b
 Capacity of Queue A = 80%
 Capacity of Queue B = 20%
 Step 1: Assign a big jobA on queue a which uses full cluster capacity
 Step 2: Submitted a jobB to queue b  which would use less than 20% of cluster 
 capacity
 JobA task which uses queue b capcity is been preempted and killed.
 This caused below problem:
 1. New Container has got allocated for jobA in Queue A as per node update 
 from an NM.
 2. This container has been preempted immediately as per preemption.
 Here ACQUIRED at KILLED Invalid State exception came when the next AM 
 heartbeat reached RM.
 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ACQUIRED at KILLED
 This also caused the Task to go for a timeout for 30minutes as this Container 
 was already killed by preemption.
 attempt_1380289782418_0003_m_00_0 Timed out after 1800 secs



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046217#comment-14046217
 ] 

Hadoop QA commented on YARN-614:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652862/YARN-614.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4123//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4123//console

This message is automatically generated.

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, 
 YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2204) TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler

2014-06-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046219#comment-14046219
 ] 

Hudson commented on YARN-2204:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5790 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5790/])
YARN-2204. Addendum patch. TestAMRestart#testAMRestartWithExistingContainers 
assumes CapacityScheduler. (Robert Kanter via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606168)
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler
 ---

 Key: YARN-2204
 URL: https://issues.apache.org/jira/browse/YARN-2204
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Trivial
 Fix For: 2.5.0

 Attachments: YARN-2204.patch, YARN-2204_addendum.patch, 
 YARN-2204_addendum.patch


 TestAMRestart#testAMRestartWithExistingContainers assumes CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046300#comment-14046300
 ] 

Xuan Gong commented on YARN-614:


Not sure why this 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions fails, 
it passed on my local machine.
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
 is not related
For 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart,
 it fails because of time-out.  I added more logic on the test case, I need to 
increase the time-out.

Submitted new patch to kick the Jenkins again..

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.7.patch, YARN-614.8.patch, 
 YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count

2014-06-27 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-614:
---

Attachment: YARN-614.12.patch

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, 
 YARN-614.8.patch, YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1373) Transition RMApp and RMAppAttempt state to RUNNING after restart for recovered running apps


[ 
https://issues.apache.org/jira/browse/YARN-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046328#comment-14046328
 ] 

Vinod Kumar Vavilapalli commented on YARN-1373:
---

Since YARN-1210, we always have had the app and app-attempt move to RUNNING 
state after RM restarts. That's why it is a dup.

 Transition RMApp and RMAppAttempt state to RUNNING after restart for 
 recovered running apps
 ---

 Key: YARN-1373
 URL: https://issues.apache.org/jira/browse/YARN-1373
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi

 Currently the RM moves recovered app attempts to the a terminal recovered 
 state and starts a new attempt. Instead, it will have to transition the last 
 attempt to a running state such that it can proceed as normal once the 
 running attempt has resynced with the ApplicationMasterService (YARN-1365 and 
 YARN-1366). If the RM had started the application container before dying then 
 the AM would be up and trying to contact the RM. The RM may have had died 
 before launching the container. For this case, the RM should wait for AM 
 liveliness period and issue a kill container for the stored master container. 
 It should transition this attempt to some RECOVER_ERROR state and proceed to 
 start a new attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1695) Implement the rest (writable APIs) of RM web-services


 [ 
https://issues.apache.org/jira/browse/YARN-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1695:
---

Priority: Blocker  (was: Major)

 Implement the rest (writable APIs) of RM web-services
 -

 Key: YARN-1695
 URL: https://issues.apache.org/jira/browse/YARN-1695
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Vinod Kumar Vavilapalli
Assignee: Varun Vasudev
Priority: Blocker

 MAPREDUCE-2863 added the REST web-services to RM and NM. But all the APIs 
 added there were only focused on obtaining information from the cluster. We 
 need to have the following REST APIs to finish the feature
  - Application submission/termination (Priority): This unblocks easy client 
 interaction with a YARN cluster
  - Application Client protocol: For resource scheduling by apps written in an 
 arbitrary language. Will have to think about throughput concerns
  - ContainerManagement Protocol: Again for arbitrary language apps.
 One important thing to note here is that we already have client libraries on 
 all the three protocols that do some some heavy-lifting. One part of the 
 effort is to figure out if they can be made any thinner and/or how 
 web-services will implement the same functionality.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1713) Implement getnewapplication and submitapp as part of RM web service


 [ 
https://issues.apache.org/jira/browse/YARN-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1713:
---

Priority: Blocker  (was: Major)
Target Version/s: 2.5.0

 Implement getnewapplication and submitapp as part of RM web service
 ---

 Key: YARN-1713
 URL: https://issues.apache.org/jira/browse/YARN-1713
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
Priority: Blocker
 Attachments: apache-yarn-1713.3.patch, apache-yarn-1713.4.patch, 
 apache-yarn-1713.5.patch, apache-yarn-1713.6.patch, apache-yarn-1713.7.patch, 
 apache-yarn-1713.8.patch, apache-yarn-1713.cumulative.2.patch, 
 apache-yarn-1713.cumulative.3.patch, apache-yarn-1713.cumulative.4.patch, 
 apache-yarn-1713.cumulative.patch, apache-yarn-1713.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046350#comment-14046350
 ] 

Hadoop QA commented on YARN-614:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652876/YARN-614.12.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4124//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4124//console

This message is automatically generated.

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, 
 YARN-614.8.patch, YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: Anyone know how to mock a secured hdfs for unit test?

2014-06-27 Thread Chris Nauroth

Hi David and Kai,

There are a couple of challenges with this, but I just figured out a pretty
decent setup while working on HDFS-2856.  That code isn't committed yet,
but if you open patch version 5 attached to that issue and look for the
TestSaslDataTransfer class, then you'll see how it works.  Most of the
logic for bootstrapping a MiniKDC and setting up the right HDFS
configuration properties is in an abstract base class named
SaslDataTransferTestCase.

I hope this helps.

There are a few other open issues out there related to tests in secure
mode.  I know of HDFS-4312 and HDFS-5410.  It would be great to get more
regular test coverage with something that more closely approximates a
secured deployment.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, Jun 26, 2014 at 7:27 AM, Zheng, Kai kai.zh...@intel.com wrote:

 Hi David,

 Quite some time ago I opened HADOOP-9952 and planned to create secured
 MiniClusters by making use of MiniKDC. Unfortunately since then I didn't
 get the chance to work on it yet. If you need something like that and would
 contribute, please let me know and see if anything I can help with. Thanks.

 Regards,
 Kai

 -Original Message-
 From: Liu, David [mailto:liujion...@gmail.com]
 Sent: Thursday, June 26, 2014 10:12 PM
 To: hdfs-...@hadoop.apache.org; hdfs-iss...@hadoop.apache.org;
 yarn-...@hadoop.apache.org; yarn-issues@hadoop.apache.org;
 mapreduce-...@hadoop.apache.org; secur...@hadoop.apache.org
 Subject: Anyone know how to mock a secured hdfs for unit test?

 Hi all,

 I need to test my code which read data from secured hdfs, is there any
 library to mock secured hdfs, can minihdfscluster do the work?
 Any suggestion is appreciated.


 Thanks


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Created] (YARN-2223) NPE on ResourceManager recover

2014-06-27 Thread Jon Bringhurst (JIRA)

Jon Bringhurst created YARN-2223:


 Summary: NPE on ResourceManager recover
 Key: YARN-2223
 URL: https://issues.apache.org/jira/browse/YARN-2223
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Jon Bringhurst


I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).

Both clusters have the same config (other than hostnames). Both are running on 
JDK8u5 (I'm not sure if this is a factor here).

One cluster started up without any errors. The other started up with the 
following error on the RM:

{noformat}
18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 1 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0001 with 8 attempts and final state = KILLED
18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_01 with final state: KILLED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_02 with final state: FAILED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_03 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_04 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_05 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_06 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_07 with final state: FAILED
18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_08 with final state: FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 
State change from NEW to KILLED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 
State change from NEW to FAILED
18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State change 
from NEW to KILLED
18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 2 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0002 with 8 attempts and final state = KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_01 with final state: KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_02 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_03 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_04 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_05 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_06 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_07 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_08 with final state: FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 
State change from NEW to KILLED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 
State

[jira] [Updated] (YARN-2223) NPE on ResourceManager recover

2014-06-27 Thread Jon Bringhurst (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Bringhurst updated YARN-2223:
-

Description: 
I upgraded two clusters from tag 2.2.0 to branch-2.4.1 (latest commit is 
https://github.com/apache/hadoop-common/commit/c96c8e45a60651b677a1de338b7856a444dc0461).

Both clusters have the same config (other than hostnames). Both are running on 
JDK8u5 (I'm not sure if this is a factor here).

One cluster started up without any errors. The other started up with the 
following error on the RM:

{noformat}
18:33:45,463  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 1 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,465  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0001 with 8 attempts and final state = KILLED
18:33:45,468  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_01 with final state: KILLED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_02 with final state: FAILED
18:33:45,478  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_03 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_04 with final state: FAILED
18:33:45,479  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_05 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_06 with final state: FAILED
18:33:45,480  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_07 with final state: FAILED
18:33:45,481  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0001_08 with final state: FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_01 
State change from NEW to KILLED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_02 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_03 
State change from NEW to FAILED
18:33:45,482  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_04 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_05 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_06 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_07 
State change from NEW to FAILED
18:33:45,483  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0001_08 
State change from NEW to FAILED
18:33:45,485  INFO RMAppImpl:639 - application_1398450350082_0001 State change 
from NEW to KILLED
18:33:45,485  WARN RMAppImpl:331 - The specific max attempts: 0 for 
application: 2 is invalid, because it is out of the range [1, 50]. Use the 
global max attempts instead.
18:33:45,485  INFO RMAppImpl:651 - Recovering app: 
application_1398450350082_0002 with 8 attempts and final state = KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_01 with final state: KILLED
18:33:45,486  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_02 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_03 with final state: FAILED
18:33:45,487  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_04 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_05 with final state: FAILED
18:33:45,488  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_06 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_07 with final state: FAILED
18:33:45,489  INFO RMAppAttemptImpl:691 - Recovering attempt: 
appattempt_1398450350082_0002_08 with final state: FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_01 
State change from NEW to KILLED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_02 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_03 
State change from NEW to FAILED
18:33:45,490  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_04 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_05 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_06 
State change from NEW to FAILED
18:33:45,491  INFO RMAppAttemptImpl:659 - appattempt_1398450350082_0002_07 
State change from NEW to FAILED

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046419#comment-14046419
 ] 

Jian He commented on YARN-2052:
---

Patch looks good overall, can you update MemoryStateStore also so that we can 
test the containerId issued by the new RM is correctly ? thx
{code}
-assertEquals(4, schedulerAttempt.getNewContainerId());
+assertEquals(1, schedulerAttempt.getNewContainerId());
{code}

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Maysam Yabandeh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046425#comment-14046425
 ] 

Maysam Yabandeh commented on YARN-2104:
---

+1
Worked for us. And the failed unit test seems irrelevant.

 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings

Anubhav Dhoot created YARN-2224:
---

 Summary: Let 
TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of 
the default settings
 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
will fail. Make the test pass not rely on the default settings but just let it 
verify that once the setting is turned on it actually does the memory check. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings


 [ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2224:


Attachment: YARN-2224.patch

Sets the flag to be true so that the test does not fail if the default was set 
to false.

 Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
 of the default settings
 -

 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2224.patch


 If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
 will fail. Make the test pass not rely on the default settings but just let 
 it verify that once the setting is turned on it actually does the memory 
 check. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings


 [ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2224:


Description: If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to 
false the test will fail. Make the test pass not rely on the default settings 
but just let it verify that once the setting is turned on it actually does the 
memory check. See YARN-2225 which suggests we turn the default off.  (was: If 
the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test will 
fail. Make the test pass not rely on the default settings but just let it 
verify that once the setting is turned on it actually does the memory check. )

 Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
 of the default settings
 -

 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2224.patch


 If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
 will fail. Make the test pass not rely on the default settings but just let 
 it verify that once the setting is turned on it actually does the memory 
 check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2225) Turn the virtual memory check to be off by default

Anubhav Dhoot created YARN-2225:
---

 Summary: Turn the virtual memory check to be off by default
 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot


The virtual memory check may not be the best way to isolate applications. 
Virtual memory is not the constrained resource. It would be better if we limit 
the swapping of the task using swapiness instead. This patch will turn this off 
by default and let users turn it on if they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings


[ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046446#comment-14046446
 ] 

Anubhav Dhoot commented on YARN-2224:
-

Once the test is made resilient, we can decide in YARN-2225 to turn the 
defaults off

 Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
 of the default settings
 -

 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2224.patch


 If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
 will fail. Make the test pass not rely on the default settings but just let 
 it verify that once the setting is turned on it actually does the memory 
 check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2225) Turn the virtual memory check to be off by default


 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2225:


Description: The virtual memory check may not be the best way to isolate 
applications. Virtual memory is not the constrained resource. It would be 
better if we limit the swapping of the task using swapiness instead. This patch 
will turn this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn 
it on if they need to.  (was: The virtual memory check may not be the best way 
to isolate applications. Virtual memory is not the constrained resource. It 
would be better if we limit the swapping of the task using swapiness instead. 
This patch will turn this off by default and let users turn it on if they need 
to.)

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2225) Turn the virtual memory check to be off by default


 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2225:


Attachment: YARN-2225.patch

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this off by default and let users turn it on if they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2225) Turn the virtual memory check to be off by default


 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-2225:
---

Assignee: Anubhav Dhoot

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this off by default and let users turn it on if they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2224) Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective of the default settings


[ 
https://issues.apache.org/jira/browse/YARN-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046469#comment-14046469
 ] 

Hadoop QA commented on YARN-2224:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652903/YARN-2224.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4125//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4125//console

This message is automatically generated.

 Let TestContainersMonitor#testContainerKillOnMemoryOverflow work irrespective 
 of the default settings
 -

 Key: YARN-2224
 URL: https://issues.apache.org/jira/browse/YARN-2224
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2224.patch


 If the default setting DEFAULT_NM_VMEM_CHECK_ENABLED is set to false the test 
 will fail. Make the test pass not rely on the default settings but just let 
 it verify that once the setting is turned on it actually does the memory 
 check. See YARN-2225 which suggests we turn the default off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046480#comment-14046480
 ] 

Hadoop QA commented on YARN-2225:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652908/YARN-2225.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4126//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4126//console

This message is automatically generated.

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046489#comment-14046489
 ] 

Vinod Kumar Vavilapalli commented on YARN-2225:
---

-1 for changing the default.. This breaks compatibility.

bq. The virtual memory check may not be the best way to isolate applications. 
Virtual memory is not the constrained resource.
I still see a lot of apps that needs isolation w.r.t vmem.

It's not about which resource is constrained, it is about isolation. We already 
identify physical memory as constrained and use that as the main scheduling 
dimension. 

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046494#comment-14046494
 ] 

Karthik Kambatla commented on YARN-2225:


According to your compatibility guide, The default values of Hadoop-defined 
properties can be changed across minor/major releases, but will remain the same 
across point releases within a minor release.

So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In 
spirit, virtual memory check has been a pain and we end up recommending users 
to turn it off. 

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (YARN-2225) Turn the virtual memory check to be off by default


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046494#comment-14046494
 ] 

Karthik Kambatla edited comment on YARN-2225 at 6/27/14 10:48 PM:
--

According to our compatibility guide, The default values of Hadoop-defined 
properties can be changed across minor/major releases, but will remain the same 
across point releases within a minor release.

So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In 
spirit, virtual memory check has been a pain and we end up recommending users 
to turn it off. 


was (Author: kkambatl):
According to your compatibility guide, The default values of Hadoop-defined 
properties can be changed across minor/major releases, but will remain the same 
across point releases within a minor release.

So, in letter, we can't target 2.4.1 or 2.5.1, but can target 2.5 or 2.6. In 
spirit, virtual memory check has been a pain and we end up recommending users 
to turn it off. 

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores

Vinod Kumar Vavilapalli created YARN-2226:
-

 Summary: RMStateStore versioning (CURRENT_VERSION_INFO) should 
apply to all stores
 Key: YARN-2226
 URL: https://issues.apache.org/jira/browse/YARN-2226
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


We need all state store impls to be versioned. Should move 
ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning applies 
to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046504#comment-14046504
 ] 

Vinod Kumar Vavilapalli commented on YARN-2052:
---

Not related to this patch, but I think CURRENT_VERSION_INFO shouldn't be in 
ZKRMStateStore. Filed YARN-2226.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.2.patch, YARN-2052.3.patch, 
 YARN-2052.4.patch, YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, 
 YARN-2052.8.patch, YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores


 [ 
https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2226:
--

Assignee: (was: Vinod Kumar Vavilapalli)
  Labels: newbie  (was: )

 RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
 -

 Key: YARN-2226
 URL: https://issues.apache.org/jira/browse/YARN-2226
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
  Labels: newbie

 We need all state store impls to be versioned. Should move 
 ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning 
 applies to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken


 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.10.patch

[~jianhe], good catch. Updated MemoryRMStateStore and its tests.
[~vinodkv], yes, let's do this on YARN-2226.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
 YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
 YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046512#comment-14046512
 ] 

Jason Lowe commented on YARN-2104:
--

+1 lgtm.  The test failure is unrelated.  Committing this.

 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed

2014-06-27 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046528#comment-14046528
 ] 

Hudson commented on YARN-2104:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5792 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5792/])
YARN-2104. Scheduler queue filter failed to work because index of queue column 
changed. Contributed by Wangda Tan (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1606265)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerPage.java


 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046530#comment-14046530
 ] 

Jian He commented on YARN-2052:
---

- Actually, FileSystem and ZK state store has separate version because they 
might at some point diverge, we should bump up filesystem version too in this 
patch.
- These two calls are duplicated in getAndIncrement of 
FileSystemStateStore/ZKRMStateStore, we can consolidate into one,
“fs.exists(epochNodePath)/ existsWithRetries(epochNodePath, true) != null;”

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
 YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
 YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores


 [ 
https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-2226.
---

Resolution: Invalid

 RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
 -

 Key: YARN-2226
 URL: https://issues.apache.org/jira/browse/YARN-2226
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
  Labels: newbie

 We need all state store impls to be versioned. Should move 
 ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning 
 applies to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2226) RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores


[ 
https://issues.apache.org/jira/browse/YARN-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046531#comment-14046531
 ] 

Jian He commented on YARN-2226:
---

Actually, FileSystem and ZK state store has separate version because they might 
at some point diverge. close this as invalid

 RMStateStore versioning (CURRENT_VERSION_INFO) should apply to all stores
 -

 Key: YARN-2226
 URL: https://issues.apache.org/jira/browse/YARN-2226
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
  Labels: newbie

 We need all state store impls to be versioned. Should move 
 ZKRMStateStore.CURRENT_VERSION_INFO to RMStateStore so that versioning 
 applies to all stores.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046544#comment-14046544
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652917/YARN-2052.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4127//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4127//console

This message is automatically generated.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
 YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
 YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken


 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.11.patch

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
 YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
 YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046557#comment-14046557
 ] 

Jian He commented on YARN-2052:
---

can you rename RMEpoch.java to Epoch and similar RMEpochPBimpl too ?

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
 YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
 YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken


 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: (was: YARN-2052.11.patch)

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, YARN-2052.5.patch, 
 YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, YARN-2052.9.patch, 
 YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2104) Scheduler queue filter failed to work because index of queue column changed


[ 
https://issues.apache.org/jira/browse/YARN-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046560#comment-14046560
 ] 

Wangda Tan commented on YARN-2104:
--

Thanks [~maysamyabandeh] and [~jlowe] for review and commit!

 Scheduler queue filter failed to work because index of queue column changed
 ---

 Key: YARN-2104
 URL: https://issues.apache.org/jira/browse/YARN-2104
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-2104.patch


 YARN-563 added,
 {code}
 + th(.type, Application Type”).
 {code}
 to application table, which makes queue’s column index from 3 to 4. And in 
 scheduler page, queue’s column index is hard coded to 3 when filter 
 application with queue’s name,
 {code}
   if (q == 'root') q = '';,
   else q = '^' + q.substr(q.lastIndexOf('.') + 1) + '$';,
   $('#apps').dataTable().fnFilter(q, 3, true);,
 {code}
 So queue filter will not work for application page.
 Reproduce steps: (Thanks Bo Yang for pointing this)
 {code}
 1) In default setup, there’s a default queue under root queue
 2) Run an arbitrary application, you can find it in “Applications” page
 3) Click “Default” queue in scheduler page
 4) Click “Applications”, no application will show here
 5) Click “Root” queue in scheduler page
 6) Click “Applications”, application will show again
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2227) Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server

2014-06-27 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2227:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-666

 Move containerMgrProxy from RM's AMLaunch to get rid of issues that new 
 client talking with old server
 --

 Key: YARN-2227
 URL: https://issues.apache.org/jira/browse/YARN-2227
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du

 In rolling upgrade semantics, we should handle cases that old client should 
 talk with new servers if only compatible changes happen in RPC protocol. In 
 this semantics, there is no guarantee that new client should able to talk 
 with old server which need us to pay specially attention on upgrading 
 sequence. Even this, we will find that it is still hard to deal with NM talk 
 with RM as there are both client and server at both side: in regular 
 heartbeat, NM is client and RM is server; when RM launch AM client, it go 
 through containerMgrProxy and RM is client while NM is server in this case. 
 We should get rid of this situation, i.e. by removing containerMgrProxy in RM 
 and use other way to launch container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2227) Move containerMgrProxy from RM's AMLaunch to get rid of issues that new client talking with old server

2014-06-27 Thread Junping Du (JIRA)

Junping Du created YARN-2227:


 Summary: Move containerMgrProxy from RM's AMLaunch to get rid of 
issues that new client talking with old server
 Key: YARN-2227
 URL: https://issues.apache.org/jira/browse/YARN-2227
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Junping Du
Assignee: Junping Du


In rolling upgrade semantics, we should handle cases that old client should 
talk with new servers if only compatible changes happen in RPC protocol. In 
this semantics, there is no guarantee that new client should able to talk with 
old server which need us to pay specially attention on upgrading sequence. Even 
this, we will find that it is still hard to deal with NM talk with RM as there 
are both client and server at both side: in regular heartbeat, NM is client and 
RM is server; when RM launch AM client, it go through containerMgrProxy and RM 
is client while NM is server in this case. We should get rid of this situation, 
i.e. by removing containerMgrProxy in RM and use other way to launch container.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2225) Turn the virtual memory check to be off by default


[ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046566#comment-14046566
 ] 

Vinod Kumar Vavilapalli commented on YARN-2225:
---

It breaks compatibility w.r.t behavior - asking existing users who care about 
it to turn it on explicitly.

bq. In spirit, virtual memory check has been a pain and we end up recommending 
users to turn it off.
I have had a different experience. It indeed is a pain for testing both in 
Hadoop and in high level frameworks, but it's been invaluable in real life 
clusters to thwart run away jobs - specifically the non-java ones - from 
affecting the cluster.

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046572#comment-14046572
 ] 

Jian He commented on YARN-614:
--

+1

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.7.patch, 
 YARN-614.8.patch, YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count


 [ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-614:
-

Attachment: YARN-614.13.patch

renamed a unit test  name

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, 
 YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1408) Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task timeout for 30mins

[
https://issues.apache.org/jira/browse/YARN-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046590#comment-14046590
]

Wangda Tan commented on YARN-1408:
--

Hi [~sunilg],
Thanks for working out this patch so fast!

*A major problems I've seen.*
ResourceRequest stored in RMContainerImpl should include rack/any RR,
Currently, there's only one ResourceRequest stored in RMContainerImpl, this may
should not enough for recovering in following cases:
Case 1: RR may contain other fields like relaxLocality, etc. Assume a RR is
node-local, the relaxLocaity=true (default), and it's rack-local/any RR's
relaxLocality=false. In your current implementation, you cannot fully recover
original RRs.
Case 2: Rack-local RR will be missing. Assume a RR is node-local, when do
resource allocation, the outstanding rack-local/any numContainer will be
decreased, you can check AppSchedulingInfo#allocateNodeLocal for the logic of
how outstanding rack/any #containers decreased.

*My thoughts about how to implement this is:*
In FiCaScheduler#allocate, appSchedulingInfo.allocate will be invoked. You can
edit appSchedulingInfo.allocate to return a list a RRs, include node/rack/any
if possible.
Pass such RRs to RMContainerImpl

And could you please elaborate on this?
bq. AM would have asked for NodeLocal in another Hosts, which may not be able
to recover.

Does it make sense to you? I'll review minor issues and test cases in next
cycle.

Thanks,
Wangda

Preemption caused Invalid State Event: ACQUIRED at KILLED and caused a task
timeout for 30mins
--

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken


 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2052:
-

Attachment: YARN-2052.11.patch

Updated a patch to address the comments:
* Bumped up the version of FileSystemRMStateStore.
* Refactored  {{getAndIncrement}} of FileSystemStateStore/ZKRMStateStore to 
remove duplicated check of the epoch znode/file.
* Renamed RMEpoch.java to Epoch.java and RMEpochPBImpl.java to 
EpochPBImpl.java. For the consistency, updated the file/znode name of 
EPOCH_NODE from RMEpochNode to EpochNode.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
 YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
 YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-614) Separate AM failures from hardware failure or YARN error and do not count them to AM retry count


[ 
https://issues.apache.org/jira/browse/YARN-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046647#comment-14046647
 ] 

Hadoop QA commented on YARN-614:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652934/YARN-614.13.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4128//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4128//console

This message is automatically generated.

 Separate AM failures from hardware failure or YARN error and do not count 
 them to AM retry count
 

 Key: YARN-614
 URL: https://issues.apache.org/jira/browse/YARN-614
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bikas Saha
Assignee: Xuan Gong
 Fix For: 2.5.0

 Attachments: YARN-614-0.patch, YARN-614-1.patch, YARN-614-2.patch, 
 YARN-614-3.patch, YARN-614-4.patch, YARN-614-5.patch, YARN-614-6.patch, 
 YARN-614.10.patch, YARN-614.11.patch, YARN-614.12.patch, YARN-614.13.patch, 
 YARN-614.7.patch, YARN-614.8.patch, YARN-614.9.patch


 Attempts can fail due to a large number of user errors and they should not be 
 retried unnecessarily. The only reason YARN should retry an attempt is when 
 the hardware fails or YARN has an error. NM failing, lost NM and NM disk 
 errors are the hardware errors that come to mind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken


[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046703#comment-14046703
 ] 

Hadoop QA commented on YARN-2052:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652938/YARN-2052.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4129//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4129//console

This message is automatically generated.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2052.1.patch, YARN-2052.10.patch, 
 YARN-2052.11.patch, YARN-2052.2.patch, YARN-2052.3.patch, YARN-2052.4.patch, 
 YARN-2052.5.patch, YARN-2052.6.patch, YARN-2052.7.patch, YARN-2052.8.patch, 
 YARN-2052.9.patch, YARN-2052.9.patch


 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken