date:20140418


 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.1.patch

Added waitFor() API to AMRMClientAsync().

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)


 [ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1954:
-

Attachment: YARN-1954.2.patch

Deleted needless test from v1.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973839#comment-13973839
 ] 

Tsuyoshi OZAWA commented on YARN-1954:
--

[~zjshen], I added waitFor() method which takes SupplerBoolean based on 
Zhijie's idea. I appreciate if you can take a look.


 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973850#comment-13973850
 ] 

Hadoop QA commented on YARN-1954:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640781/YARN-1954.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3594//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3594//console

This message is automatically generated.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1959) Fix headroom calculation in Fair Scheduler

Sandy Ryza created YARN-1959:


 Summary: Fix headroom calculation in Fair Scheduler
 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)


[ 
https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973857#comment-13973857
 ] 

Hadoop QA commented on YARN-1954:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640783/YARN-1954.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3595//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3595//console

This message is automatically generated.

 Add waitFor to AMRMClient(Async)
 

 Key: YARN-1954
 URL: https://issues.apache.org/jira/browse/YARN-1954
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 3.0.0, 2.4.0
Reporter: Zhijie Shen
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-1954.1.patch, YARN-1954.2.patch


 Recently, I saw some use cases of AMRMClient(Async). The painful thing is 
 that the main non-daemon thread has to sit in a dummy loop to prevent AM 
 process exiting before all the tasks are done, while unregistration is 
 triggered on a separate another daemon thread by callback methods (in 
 particular when using AMRMClientAsync). IMHO, it should be beneficial to add 
 a waitFor method to AMRMClient(Async) to block the AM until unregistration or 
 user supplied check point, such that users don't need to write the loop 
 themselves.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1959) Fix headroom calculation in Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1959:
-

Description: The Fair Scheduler currently always sets the headroom to 0.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973869#comment-13973869
 ] 

Sandy Ryza commented on YARN-1959:
--

The headroom for an app should be set to the min(app's queue's max share, 
cluster capacity) - app's queues resources consumed.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk


[ 
https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973879#comment-13973879
 ] 

Tsuyoshi OZAWA commented on YARN-1778:
--

I tried but currently I cannot reproduce this problem. [~xgong], should we 
close this issue once and reopen it when we find to reproduce on trunk?

 TestFSRMStateStore fails on trunk
 -

 Key: YARN-1778
 URL: https://issues.apache.org/jira/browse/YARN-1778
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-687) TestNMAuditLogger hang

2014-04-18 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-687.
-

Resolution: Cannot Reproduce

 TestNMAuditLogger hang
 --

 Key: YARN-687
 URL: https://issues.apache.org/jira/browse/YARN-687
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 3.0.0
 Environment: Linux stevel-dev 3.2.0-24-virtual #39-Ubuntu SMP Mon May 
 21 18:44:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_27
 OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
 OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Reporter: Steve Loughran
Priority: Minor

 TestNMAuditLogger hanging repeatedly on a test VM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-687) TestNMAuditLogger hang

2014-04-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973898#comment-13973898
 ] 

Steve Loughran commented on YARN-687:
-

been a long time -I don't have that VM around. I'll close as a cannot-reproduce 
unless/until it surfaces again

 TestNMAuditLogger hang
 --

 Key: YARN-687
 URL: https://issues.apache.org/jira/browse/YARN-687
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Affects Versions: 3.0.0
 Environment: Linux stevel-dev 3.2.0-24-virtual #39-Ubuntu SMP Mon May 
 21 18:44:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_27
 OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
 OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Reporter: Steve Loughran
Priority: Minor

 TestNMAuditLogger hanging repeatedly on a test VM



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort

2014-04-18 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973901#comment-13973901
 ] 

Steve Loughran commented on YARN-1946:
--

Thomas -we proxy the GUI, but for the REST API we don't. The proxy doesn't 
forward any operation than GET, and there's no guarantee clients will handle 
307 redirects with the same HTTP verb anyway. If/when the proxy supports more 
operations, we can try with it.

the filter just says
ws/*  - no proxy
else: proxy

and ws/ doesn't do GUI

 need Public interface for WebAppUtils.getProxyHostAndPort
 -

 Key: YARN-1946
 URL: https://issues.apache.org/jira/browse/YARN-1946
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, webapp
Affects Versions: 2.4.0
Reporter: Thomas Graves
Priority: Critical

 ApplicationMasters are supposed to go through the ResourceManager web app 
 proxy if they have web UI's so they are properly secured.  There is currently 
 no public interface for Application Masters to conveniently get the proxy 
 host and port.  There is a function in WebAppUtils, but that class is 
 private.  
 We should provide this as a utility since any properly written AM will need 
 to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973973#comment-13973973
 ] 

Hudson commented on YARN-1281:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/544/])
YARN-1281. Fixed TestZKRMStateStoreZKClientConnections to not fail 
intermittently due to ZK-client timeouts. Contributed by Tsuyoshi Ozawa. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588369)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


 TestZKRMStateStoreZKClientConnections fails intermittently
 --

 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Fix For: 2.4.1

 Attachments: YARN-1281.1.patch, YARN-1281.2.patch, output.txt


 The test fails intermittently - haven't been able to reproduce the failure 
 deterministically. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1931) Private API change in YARN-1824 in 2.4 broke compatibility with previous releases


[ 
https://issues.apache.org/jira/browse/YARN-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973979#comment-13973979
 ] 

Hudson commented on YARN-1931:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/544/])
YARN-1931. Private API change in YARN-1824 in 2.4 broke compatibility with 
previous releases (Sandy Ryza via tgraves) (tgraves: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588281)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java


 Private API change in YARN-1824 in 2.4 broke compatibility with previous 
 releases
 -

 Key: YARN-1931
 URL: https://issues.apache.org/jira/browse/YARN-1931
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications
Affects Versions: 2.4.0
Reporter: Thomas Graves
Assignee: Sandy Ryza
Priority: Blocker
 Fix For: 3.0.0, 2.5.0, 2.4.1

 Attachments: YARN-1931-1.patch, YARN-1931-2.patch, YARN-1931.patch


 YARN-1824 broke compatibility with previous 2.x releases by changes the API's 
 in org.apache.hadoop.yarn.util.Apps.{setEnvFromInputString,addToEnvironment}  
 The old api should be added back in.
 This affects any ApplicationMasters who were using this api.  It also breaks 
 previously built MapReduce libraries from working with the new Yarn release 
 as MR uses this api. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster


[ 
https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973974#comment-13973974
 ] 

Hudson commented on YARN-1824:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/544/])
YARN-1931. Private API change in YARN-1824 in 2.4 broke compatibility with 
previous releases (Sandy Ryza via tgraves) (tgraves: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588281)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java


 Make Windows client work with Linux/Unix cluster
 

 Key: YARN-1824
 URL: https://issues.apache.org/jira/browse/YARN-1824
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1824.1.patch, YARN-1824.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1750) TestNodeStatusUpdater#testNMRegistration is incorrect in test case


[ 
https://issues.apache.org/jira/browse/YARN-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973978#comment-13973978
 ] 

Hudson commented on YARN-1750:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/544/])
YARN-1750. TestNodeStatusUpdater#testNMRegistration is incorrect in test case. 
(Wangda Tan via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588343)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 TestNodeStatusUpdater#testNMRegistration is incorrect in test case
 --

 Key: YARN-1750
 URL: https://issues.apache.org/jira/browse/YARN-1750
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Reporter: Ming Ma
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-1750.patch


 This test case passes. However, the test output log has
 java.lang.AssertionError: Number of applications should only be one! 
 expected:1 but was:2
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker.nodeHeartbeat(TestNodeStatusUpdater.java:267)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:469)
 at java.lang.Thread.run(Thread.java:695)
 TestNodeStatusUpdater.java has invalid asserts.
   } else if (heartBeatID == 3) {
 // Checks on the RM end
 Assert.assertEquals(Number of applications should only be one!, 1,
 appToContainers.size());
 Assert.assertEquals(Number of container for the app should be two!,
 2, appToContainers.get(appId2).size());
 We should fix the assert and add more check to the test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()


[ 
https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973977#comment-13973977
 ] 

Hudson commented on YARN-1870:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/544/])
YARN-1870. FileInputStream is not closed in 
ProcfsBasedProcessTree#constructProcessSMAPInfo. (Fengdong Yu via junping_du) 
(junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588324)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java


 FileInputStream is not closed in 
 ProcfsBasedProcessTree#constructProcessSMAPInfo()
 --

 Key: YARN-1870
 URL: https://issues.apache.org/jira/browse/YARN-1870
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Ted Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-1870.patch


 {code}
   ListString lines = IOUtils.readLines(new FileInputStream(file));
 {code}
 FileInputStream is not closed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1947) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973975#comment-13973975
 ] 

Hudson commented on YARN-1947:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/544/])
YARN-1947. TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is 
failing intermittently. (Jian He via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588365)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java


 TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing 
 intermittently
 -

 Key: YARN-1947
 URL: https://issues.apache.org/jira/browse/YARN-1947
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.4.1

 Attachments: YARN-1947.1.patch, YARN-1947.2.patch


 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1960) LocalFSFileInputStream should support mark()

2014-04-18 Thread Daniel Darabos (JIRA)

Daniel Darabos created YARN-1960:


 Summary: LocalFSFileInputStream should support mark()
 Key: YARN-1960
 URL: https://issues.apache.org/jira/browse/YARN-1960
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api
Reporter: Daniel Darabos
Priority: Minor


This is easily done by wrapping the FileInputStream in a BufferedInputStream. I 
wish for this feature because Apache Commons Compress's CompressorStreamFactory 
relies on it. There is benefit to being able to open local compressed file 
during testing.

I'll send a patch for this if it's okay.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1960) LocalFSFileInputStream should support mark()

2014-04-18 Thread Daniel Darabos (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Darabos resolved YARN-1960.
--

Resolution: Not a Problem

Duh, I should just use BufferedFSInputStream. Sorry.

 LocalFSFileInputStream should support mark()
 

 Key: YARN-1960
 URL: https://issues.apache.org/jira/browse/YARN-1960
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api
Reporter: Daniel Darabos
Priority: Minor

 This is easily done by wrapping the FileInputStream in a BufferedInputStream. 
 I wish for this feature because Apache Commons Compress's 
 CompressorStreamFactory relies on it. There is benefit to being able to open 
 local compressed file during testing.
 I'll send a patch for this if it's okay.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.

2014-04-18 Thread jay vyas (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay vyas reopened YARN-1943:



Reopening, based on the following use case: 

1) Alice and Tom trust each other.
2) They run their jobs on the same cluster.
3) Neither would ever knowingly do anything to harm the other (i.e. impersonate 
user and then write code in a M/R job to scrape ssh keys from local fs.)
4) But tom is a novice developer, and  MIGHT do something funny like 
accidentally overwrite files in /user/alice/ in some of his jobs, so SOME 
process isolation would be nice to have.
5) And also : alice and tom are using a posix style HCFS  where uid is 
important in order to do operations like chown.

So in the above scenario, there really is not much need for kerberization: its 
a simple and lightweight cluster with trusted users , but there is alot of 
value in having some basic process isolation nevertheless...i.e. from linux 
containers. 

SUGGESTION: 

Rather than add an extra parameter, we just can allow a wildcard parameter in 
the nonsecure.local-user parameter value :

{noformat}
+
nameyarn.nodemanager.linux-container-executor.nonsecure-mode.local-user/name
+value*/value
{noformat}  

That anyone who  is submitting a job, is the user that the LCE will run under.  

Essentially, this provides administrators the option of disabling/enabling the 
feature added in YARN-1253.

 Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
 -

 Key: YARN-1943
 URL: https://issues.apache.org/jira/browse/YARN-1943
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.3.0
Reporter: jay vyas
Priority: Critical
  Labels: linux
 Fix For: 2.3.0


 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser 
 replaces the user who submits a job if security is disabled: 
 {noformat}
  return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser;
 {noformat}
 However, the only way to enable security, is to NOT use SIMPLE authentication 
 mode:
 {noformat}
   public static boolean isSecurityEnabled() {
 return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE);
   }
 {noformat}
  
 Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser 
 for submission of LinuxExecutorContainer.
 This results in a confusing issue, wherein we submit a job as sally and 
 then get an exception that user nobody is not whitelisted and has UID  
 MAX_ID.
 My proposed solution is that we should be able to leverage 
 LinuxContainerExector regardless of hadoop's view of the security settings on 
 the cluster, i.e. decouple LinuxContainerExecutor logic from the 
 isSecurityEnabled return value.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster


[ 
https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974071#comment-13974071
 ] 

Hudson commented on YARN-1824:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1736 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/])
YARN-1931. Private API change in YARN-1824 in 2.4 broke compatibility with 
previous releases (Sandy Ryza via tgraves) (tgraves: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588281)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java


 Make Windows client work with Linux/Unix cluster
 

 Key: YARN-1824
 URL: https://issues.apache.org/jira/browse/YARN-1824
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.4.0

 Attachments: YARN-1824.1.patch, YARN-1824.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1947) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974072#comment-13974072
 ] 

Hudson commented on YARN-1947:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1736 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/])
YARN-1947. TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is 
failing intermittently. (Jian He via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588365)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java


 TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing 
 intermittently
 -

 Key: YARN-1947
 URL: https://issues.apache.org/jira/browse/YARN-1947
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.4.1

 Attachments: YARN-1947.1.patch, YARN-1947.2.patch


 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1947) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974146#comment-13974146
 ] 

Hudson commented on YARN-1947:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1761 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1761/])
YARN-1947. TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is 
failing intermittently. (Jian He via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588365)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java


 TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing 
 intermittently
 -

 Key: YARN-1947
 URL: https://issues.apache.org/jira/browse/YARN-1947
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.4.1

 Attachments: YARN-1947.1.patch, YARN-1947.2.patch


 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:92)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertTrue(Assert.java:54)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently


[ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974144#comment-13974144
 ] 

Hudson commented on YARN-1281:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1761 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1761/])
YARN-1281. Fixed TestZKRMStateStoreZKClientConnections to not fail 
intermittently due to ZK-client timeouts. Contributed by Tsuyoshi Ozawa. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588369)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java


 TestZKRMStateStoreZKClientConnections fails intermittently
 --

 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Fix For: 2.4.1

 Attachments: YARN-1281.1.patch, YARN-1281.2.patch, output.txt


 The test fails intermittently - haven't been able to reproduce the failure 
 deterministically. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1750) TestNodeStatusUpdater#testNMRegistration is incorrect in test case


[ 
https://issues.apache.org/jira/browse/YARN-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974149#comment-13974149
 ] 

Hudson commented on YARN-1750:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1761 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1761/])
YARN-1750. TestNodeStatusUpdater#testNMRegistration is incorrect in test case. 
(Wangda Tan via junping_du) (junping_du: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588343)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


 TestNodeStatusUpdater#testNMRegistration is incorrect in test case
 --

 Key: YARN-1750
 URL: https://issues.apache.org/jira/browse/YARN-1750
 Project: Hadoop YARN
  Issue Type: Test
  Components: nodemanager
Reporter: Ming Ma
Assignee: Wangda Tan
 Fix For: 2.4.1

 Attachments: YARN-1750.patch


 This test case passes. However, the test output log has
 java.lang.AssertionError: Number of applications should only be one! 
 expected:1 but was:2
 at org.junit.Assert.fail(Assert.java:93)
 at org.junit.Assert.failNotEquals(Assert.java:647)
 at org.junit.Assert.assertEquals(Assert.java:128)
 at org.junit.Assert.assertEquals(Assert.java:472)
 at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker.nodeHeartbeat(TestNodeStatusUpdater.java:267)
 at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:469)
 at java.lang.Thread.run(Thread.java:695)
 TestNodeStatusUpdater.java has invalid asserts.
   } else if (heartBeatID == 3) {
 // Checks on the RM end
 Assert.assertEquals(Number of applications should only be one!, 1,
 appToContainers.size());
 Assert.assertEquals(Number of container for the app should be two!,
 2, appToContainers.get(appId2).size());
 We should fix the assert and add more check to the test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient


[ 
https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974158#comment-13974158
 ] 

Tsuyoshi OZAWA commented on YARN-1917:
--

Compilation error is in ResourceMgrDelegate.java. [~leftnoteasy], can you check 
it?

 Add waitForApplicationState interface to YarnClient
 -

 Key: YARN-1917
 URL: https://issues.apache.org/jira/browse/YARN-1917
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-1917.patch


 Currently, YARN dosen't have this method. Users needs to write 
 implementations like UnmanagedAMLauncher.monitorApplication or 
 mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful 
 to end users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient

2014-04-18 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974163#comment-13974163
 ] 

Wangda Tan commented on YARN-1917:
--

Thanks [~ozawa] for this reminding :), I'll check it later.

 Add waitForApplicationState interface to YarnClient
 -

 Key: YARN-1917
 URL: https://issues.apache.org/jira/browse/YARN-1917
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-1917.patch


 Currently, YARN dosen't have this method. Users needs to write 
 implementations like UnmanagedAMLauncher.monitorApplication or 
 mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful 
 to end users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974194#comment-13974194
 ] 

Tsuyoshi OZAWA commented on YARN-1798:
--

I could reproduce the failure of TestContainerLaunch:
{quote}
Running 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
Tests run: 10, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 57.103 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
  Time elapsed: 25.302 sec   FAILURE!
java.lang.AssertionError: ContainerState is not correct (timedout) 
expected:COMPLETE but was:RUNNING
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:276)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:254)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.internalKillTest(TestContainerLaunch.java:704)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:748)

testImmediateKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
  Time elapsed: 25.087 sec   FAILURE!
java.lang.AssertionError: ContainerState is not correct (timedout) 
expected:COMPLETE but was:RUNNING
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:276)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:254)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.internalKillTest(TestContainerLaunch.java:704)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testImmediateKill(TestContainerLaunch.java:753)

testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
  Time elapsed: 5.058 sec   FAILURE!
java.lang.AssertionError: Process is not alive!
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:582)

Running 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.69 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
testContainerKillOnMemoryOverflow(org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor)
  Time elapsed: 18.261 sec   FAILURE!
java.lang.AssertionError: expected:143 but was:0
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor.testContainerKillOnMemoryOverflow(TestContainersMonitor.java:273)

Running org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.731 sec  
FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
testKillContainersOnShutdown(org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown)
  Time elapsed: 6.404 sec   FAILURE!
java.lang.AssertionError: Did not find sigterm message
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.testKillContainersOnShutdown(TestNodeManagerShutdown.java:153)
{quote}

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL:

[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974197#comment-13974197
 ] 

Tsuyoshi OZAWA commented on YARN-1798:
--

output log is as follows:

{quote}
org.apache.hadoop.yarn.exceptions.YarnException: Unable to get local resources 
when Container container_1397835546010_0001_01_01 is at null
at 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:173)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testCallFailureWithNullLocalizedResources(TestContainerLaunch.java:779)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{quote}

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


 [ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1798:
-

Attachment: TestContainerLaunch-output.txt

Attached output log at the test tailure.

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974207#comment-13974207
 ] 

Tsuyoshi OZAWA commented on YARN-1798:
--

Sorry, the output log I mentioned in comment looks not related to test failure 
- it works well in test. The test failure looks just assertion failure because 
of timing issue.

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


 [ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1798:
-

Attachment: TestContainerLaunch.txt

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


 [ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-1798:


Assignee: Tsuyoshi OZAWA

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


 [ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1798:
-

Attachment: YARN-1798.1.patch

The test failure in TestContainerLaunch is caused by the timeout of 
waitForContainerState(). This patch extends the timeout value.

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, 
 YARN-1798.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1932) Javascript injection on the job status page


[ 
https://issues.apache.org/jira/browse/YARN-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974227#comment-13974227
 ] 

Jason Lowe commented on YARN-1932:
--

+1 lgtm.  I'll commit this later today unless there are any objections.

 Javascript injection on the job status page
 ---

 Key: YARN-1932
 URL: https://issues.apache.org/jira/browse/YARN-1932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.9, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
Priority: Blocker
 Attachments: YARN-1932.patch, YARN-1932.patch


 Scripts can be injected into the job status page as the diagnostics field is
 not sanitized. Whatever string you set there will show up to the jobs page as 
 it is ... ie. if you put any script commands, they will be executed in the 
 browser of the user who is opening the page.
 We need escaping the diagnostic string in order to not run the scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974255#comment-13974255
 ] 

Tsuyoshi OZAWA commented on YARN-1798:
--

Confirmed attached patch is not enough. I'll look deeply.

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, 
 YARN-1798.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux


[ 
https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974264#comment-13974264
 ] 

Hadoop QA commented on YARN-1798:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640845/YARN-1798.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3596//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3596//console

This message is automatically generated.

 TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, 
 TestNodeStatusUpdater fails on Linux
 -

 Key: YARN-1798
 URL: https://issues.apache.org/jira/browse/YARN-1798
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, 
 YARN-1798.1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1940) deleteAsUser() terminates early without deleting more files on error


[ 
https://issues.apache.org/jira/browse/YARN-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974336#comment-13974336
 ] 

Jason Lowe commented on YARN-1940:
--

+1 lgtm, committing this

 deleteAsUser() terminates early without deleting more files on error
 

 Key: YARN-1940
 URL: https://issues.apache.org/jira/browse/YARN-1940
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
 Attachments: YARN-1940-v2.patch, YARN-1940.patch


 In container-executor.c, delete_path() returns early when unlink() against a 
 file or a symlink fails. We have seen many cases of the error being ENOENT, 
 which can safely be ignored during delete.  
 This is what we saw recently: An app mistakenly created a large number of 
 files in the local directory and the deletion service failed to delete a 
 significant portion of them due to this bug. Repeatedly hitting this on the 
 same node led to exhaustion of inodes in one of the partitions.
 Beside ignoring ENOENT,  delete_path() can simply skip the failed one and 
 continue in some cases, rather than aborting and leaving files behind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues

2014-04-18 Thread Ashwin Shankar (JIRA)

Ashwin Shankar created YARN-1961:


 Summary: Fair scheduler preemption doesn't work for non-leaf queues
 Key: YARN-1961
 URL: https://issues.apache.org/jira/browse/YARN-1961
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.4.0
Reporter: Ashwin Shankar


Setting minResources and minSharePreemptionTimeout to a non-leaf queue doesn't 
cause preemption to happen when that non-leaf queue is below minResources and 
there are outstanding demands in that non-leaf queue.
Here is an example fs allocation config(partial) :
{code:xml}
queue name=abc
  minResources3072 mb,0 vcores/minResources
  minSharePreemptionTimeout30/minSharePreemptionTimeout
queue name=childabc1
/queue
queue name=childabc2
/queue
 /queue
 {code}
With the above configs,preemption doesn't seem to happen if queue abc is below 
minShare and it has outstanding unsatisfied demands from apps in its child 
queues. Ideally in such cases we would like preemption to kick off and reclaim 
resources from other queues(not under queue abc).

Looking at the code it seems like preemption checks for starvation only at the 
leaf queue level and not at the parent level.
{code:title=FairScheduler.java|borderStyle=solid}
boolean isStarvedForMinShare(FSLeafQueue sched)
boolean isStarvedForFairShare(FSLeafQueue sched)
{code}

This affects our use case where we have a parent queue with probably a 100 
unconfigured leaf queues under it.We want to give a minshare to the parent 
queue to protect all the leaf queues under it,but we cannot do it due to this 
bug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1940) deleteAsUser() terminates early without deleting more files on error


[ 
https://issues.apache.org/jira/browse/YARN-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974423#comment-13974423
 ] 

Hudson commented on YARN-1940:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5537 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5537/])
YARN-1940. deleteAsUser() terminates early without deleting more files on 
error. Contributed by Rushabh S Shah (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588546)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 deleteAsUser() terminates early without deleting more files on error
 

 Key: YARN-1940
 URL: https://issues.apache.org/jira/browse/YARN-1940
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Rushabh S Shah
 Fix For: 3.0.0, 2.5.0

 Attachments: YARN-1940-v2.patch, YARN-1940.patch


 In container-executor.c, delete_path() returns early when unlink() against a 
 file or a symlink fails. We have seen many cases of the error being ENOENT, 
 which can safely be ignored during delete.  
 This is what we saw recently: An app mistakenly created a large number of 
 files in the local directory and the deletion service failed to delete a 
 significant portion of them due to this bug. Repeatedly hitting this on the 
 same node led to exhaustion of inodes in one of the partitions.
 Beside ignoring ENOENT,  delete_path() can simply skip the failed one and 
 continue in some cases, rather than aborting and leaving files behind.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path

2014-04-18 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974450#comment-13974450
 ] 

Siddharth Seth commented on YARN-766:
-

[~djp], The 2.x patch is only required to fix a difference in formatting 
between trunk and branch-2. Up to you on whether to fix the trunk formatting in 
this jira or whenever the code is touched next.

 TestNodeManagerShutdown should use Shell to form the output path
 

 Key: YARN-766
 URL: https://issues.apache.org/jira/browse/YARN-766
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 2.1.0-beta
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-766.branch-2.txt, YARN-766.trunk.txt, YARN-766.txt


 File scriptFile = new File(tmpDir, scriptFile.sh);
 should be replaced with
 File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile);
 to match trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974456#comment-13974456
 ] 

Sandy Ryza commented on YARN-1959:
--

One thing I don't understand from reading the Capacity Scheduler headroom 
calculation is how it prevents apps from starving when a max capacity isn't 
set.  It's defined as min((userLimit, queue-max-cap) - consumed).  If no max 
capacities are set and two users are running in a queue, each taking up half 
the queue's capacity, the headroom for each user will be half the queue's 
capacity.  If the cluster is saturated to the extent that the queue's usage 
can't go above its capacity, the headroom is being vastly overreported.

[~jlowe], any insight on this?

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974467#comment-13974467
 ] 

Jason Lowe commented on YARN-1959:
--

Yes, over-reporting of the headroom in the CapacityScheduler is a known issue.  
See YARN-1857.  I think the calculation for the CapacityScheduler should be 
more like min((userLimit-userConsumed), (queueMax-queueConsumed)).  The idea 
being that one can't go over the user limit but also can't go over what the 
queue has free either.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues


[ 
https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974497#comment-13974497
 ] 

Sandy Ryza commented on YARN-1864:
--

Thanks for working on this Ashwin.  I like the approach.

{code}
+if (hierarchicalUserQueue.equals(element.getAttribute(name))) {
+  throw new AllocationConfigurationException(
+  hierarchicalUserQueue cannot be a nested rule);
+}
{code}
Any reason why it can't?

{code}
+// Verify if the queue returned by the nested rule is an existing leaf 
queue,
+// if yes then skip to next rule in the queue placement policy
+if (queueMgr.exists(queueName)
+ (queueMgr.getQueue(queueName) instanceof FSLeafQueue)) {
+  return ;
+}
{code}
The QueuePlacementPolicy isn't in responsible for verifying this.  We expect to 
hit an error later on if the queue returned by the QueuePlacementPolicy isn't a 
leaf queue.  This can happen with other rules, for example with the specified 
rule if someone tries to explicitly submit to a parent queue.

Given this, hierarchical queue rule should be terminal if its nested rule is 
terminal, right?

{code}
+if (create
+|| (!isNestedRule
+ configuredQueues.get(FSQueueType.LEAF).contains(queue) 
+|| (isNestedRule  configuredQueues
+.get(FSQueueType.PARENT).contains(queue {
{code}
Along similar lines to the previous comment, I don't think we need this logic, 
or isNestedQueue and FSQueueType, in the queue placement policy.  If the rule 
places an app in root.engineering.ashwin and root.engineering happens to be a 
leaf or root.ashwin happens to be a parent, we'll end up throwing an error 
later, which is about the best we can do.  Let me know if there are cases I'm 
missing or if I'm not thinking this through deeply enough. 

{code}
+   * Places all apps in the specified default queue.
+   * If no default queue is specified or if the specified default queue
+   * doesn't exist, the app is placed in root.default queue
{code}
This has gotten to be a fairly big patch - mind making these changed to the 
default rule in a separate JIRA?

The name HierarchicalUserQueue seems a kind of weird to me, as the meaning of 
hierarchical in it is a little vague or maybe already overloaded.  Maybe 
UserQueueUnderneath, UserQueueInside, or UserQueueBelow?  Any other ideas?

I've got a few stylistic comments in addition to these, but wanted to get this 
stuff worked out first.

 Fair Scheduler Dynamic Hierarchical User Queues
 ---

 Key: YARN-1864
 URL: https://issues.apache.org/jira/browse/YARN-1864
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: scheduler
Reporter: Ashwin Shankar
  Labels: scheduler
 Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt


 In Fair Scheduler, we want to be able to create user queues under any parent 
 queue in the hierarchy. For eg. Say user1 submits a job to a parent queue 
 called root.allUserQueues, we want be able to create a new queue called 
 root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted 
 by this user to root.allUserQueues will be run in this newly created 
 root.allUserQueues.user1.
 This is very similar to the 'user-as-default' feature in Fair Scheduler which 
 creates user queues under root queue. But we want the ability to create user 
 queues under ANY parent queue.
 Why do we want this ?
 1. Preemption : these dynamically created user queues can preempt each other 
 if its fair share is not met. So there is fairness among users.
 User queues can also preempt other non-user leaf queue as well if below fair 
 share.
 2. Allocation to user queues : we want all the user queries(adhoc) to consume 
 only a fraction of resources in the shared cluster. By creating this 
 feature,we could do that by giving a fair share to the parent user queue 
 which is then redistributed to all the dynamically created user queues.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1932) Javascript injection on the job status page


[ 
https://issues.apache.org/jira/browse/YARN-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974528#comment-13974528
 ] 

Hudson commented on YARN-1932:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5539 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5539/])
YARN-1932. Javascript injection on the job status page. Contributed by Mit 
Desai (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588572)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/InfoBlock.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java


 Javascript injection on the job status page
 ---

 Key: YARN-1932
 URL: https://issues.apache.org/jira/browse/YARN-1932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 0.23.9, 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
Priority: Blocker
 Attachments: YARN-1932.patch, YARN-1932.patch


 Scripts can be injected into the job status page as the diagnostics field is
 not sanitized. Whatever string you set there will show up to the jobs page as 
 it is ... ie. if you put any script commands, they will be executed in the 
 browser of the user who is opening the page.
 We need escaping the diagnostic string in order to not run the scripts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974541#comment-13974541
 ] 

Sandy Ryza commented on YARN-1959:
--

Ah, ok, thanks Jason.  With your formula, assuming no user limits, what happens 
if queueMax is 100%?  All queue maxes are 100% by default in the Capacity 
Scheduler, right?  If there are two queues both with max 100%, and both using 
50% of resources, they would both end up with 50% headroom.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974550#comment-13974550
 ] 

Jason Lowe commented on YARN-1959:
--

Good point, it would also need to min against the available cluster resources 
to cover the case of cross-queue contention.

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1959) Fix headroom calculation in Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reassigned YARN-1959:


Assignee: Sandy Ryza

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974557#comment-13974557
 ] 

Sandy Ryza commented on YARN-1959:
--

Cool, wanted to make sure I understood how it worked.

In that case, I think the best choice for the Fair Scheduler would probably be 
min(cluster capacity - cluster consumed, queue max share - queue consumed).


 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1865) ShellScriptBuilder does not check for some error conditions

2014-04-18 Thread Ivan Mitic (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Mitic updated YARN-1865:
-

Attachment: YARN-1865.4.patch

Addressing my latest comment to add a test case for the non-zero exit code. 
Will commit this patch assuming it receives +1 from Jenkins.

 ShellScriptBuilder does not check for some error conditions
 ---

 Key: YARN-1865
 URL: https://issues.apache.org/jira/browse/YARN-1865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.2.0, 2.3.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Minor
 Attachments: YARN-1865.1.patch, YARN-1865.2.patch, YARN-1865.3.patch, 
 YARN-1865.4.patch


 The WindowsShellScriptBuilder does not check for commands exceeding windows 
 maximum shell command line length (8191 chars)
 Neither Unix  nor Windows script builder do not check for error condition 
 after mkdir nor link
 WindowsShellScriptBuilder mkdir is not safe with regard to paths containing 
 spaces



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1917) Add waitForApplicationState interface to YarnClient

2014-04-18 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-1917:
-

Attachment: YARN-1917.patch

uploaded patch resolved build failure

 Add waitForApplicationState interface to YarnClient
 -

 Key: YARN-1917
 URL: https://issues.apache.org/jira/browse/YARN-1917
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-1917.patch, YARN-1917.patch


 Currently, YARN dosen't have this method. Users needs to write 
 implementations like UnmanagedAMLauncher.monitorApplication or 
 mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful 
 to end users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1865) ShellScriptBuilder does not check for some error conditions


[ 
https://issues.apache.org/jira/browse/YARN-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974661#comment-13974661
 ] 

Hadoop QA commented on YARN-1865:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640911/YARN-1865.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3597//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3597//console

This message is automatically generated.

 ShellScriptBuilder does not check for some error conditions
 ---

 Key: YARN-1865
 URL: https://issues.apache.org/jira/browse/YARN-1865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.2.0, 2.3.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Minor
 Attachments: YARN-1865.1.patch, YARN-1865.2.patch, YARN-1865.3.patch, 
 YARN-1865.4.patch


 The WindowsShellScriptBuilder does not check for commands exceeding windows 
 maximum shell command line length (8191 chars)
 Neither Unix  nor Windows script builder do not check for error condition 
 after mkdir nor link
 WindowsShellScriptBuilder mkdir is not safe with regard to paths containing 
 spaces



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-1959) Fix headroom calculation in Fair Scheduler

2014-04-18 Thread Anubhav Dhoot (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot reassigned YARN-1959:
---

Assignee: Anubhav Dhoot  (was: Sandy Ryza)

 Fix headroom calculation in Fair Scheduler
 --

 Key: YARN-1959
 URL: https://issues.apache.org/jira/browse/YARN-1959
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Sandy Ryza
Assignee: Anubhav Dhoot

 The Fair Scheduler currently always sets the headroom to 0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-04-18 Thread Anubhav Dhoot (JIRA)

[
https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anubhav Dhoot updated YARN-556:
---

Attachment: WorkPreservingRestartPrototype.001.patch

This prototype is a way to understand the overall design and the major issues
that need to be addressed and minor details that crop up.
This is not a substitute to actual code/unit test for each sub task.
Hopefully this will help a discussion on the approach for overall approach and
each sub task.

In this prototype, the following changes are demonstrated.

1. Containers that were running when RM restarted, will continue
running
2. NM on resync sends the list of running containers as
ContainerReport so they provide container capability (sizes).
3. AM on resync reregisters instead of shutting down. AM can make
further requests after RM restart and they are accepted.
4. Sample of scheduler changes in FairScheduler. It reregisters
the application attempt on recovery. On NM addNode it adds the containers to
that applicationAttempt and charges these correctly to the application attempt
for tracking usage.
5. Application and Containers resume their lifecycle with
additional transitions to support continuation after recovery.
6. clustertimestamp is added to containerId so that containerId
after RM restart do not clash with containerId before (as the containerId
counter resets to zero in memory)
7. Changes are controlled by flag.

Not addressed topics
1. Key and token changes
2. AM does not resend requests sent before restart yet. So if the
RM restarts after AM has made its request and before RM returns a container, AM
is left waiting for allocation. Only new asks made after RM restart work.
3. Completed container status as per design is not handled yet.

Readme for running through the prototype

a) Setup with RM recovery turned on and scheduler set to FairScheduler
b) Start sleep job with map and reduce such as
bin/hadoop jar
./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar
sleep -mt 12000 -rt 12000
c) Restart RM (yarn-daemon.sh stop/start resourcemanager) and see that
containers are not restarted.
Following 2 scenarios work
1. restart rm while reduce is running. reduce continues and then application
completes successfully. Demonstrates continuation of running containers without
restart.
2. restart rm while map is running. map continues and then reduce executes
and then application completes successfully. Demonstrates requesting more
resources after restart works in addition to the previous scenario.

RM Restart phase 2 - Work preserving restart

Key: YARN-556
URL: https://issues.apache.org/jira/browse/YARN-556
Project: Hadoop YARN
Issue Type: New Feature
Components: resourcemanager
Reporter: Bikas Saha
Assignee: Bikas Saha
Attachments: Work Preserving RM Restart.pdf,
WorkPreservingRestartPrototype.001.patch

YARN-128 covered storing the state needed for the RM to recover critical
information. This umbrella jira will track changes needed to recover the
running state of the cluster so that work can be preserved across RM restarts.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient


[ 
https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974678#comment-13974678
 ] 

Hadoop QA commented on YARN-1917:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12640914/YARN-1917.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3598//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3598//console

This message is automatically generated.

 Add waitForApplicationState interface to YarnClient
 -

 Key: YARN-1917
 URL: https://issues.apache.org/jira/browse/YARN-1917
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: client
Affects Versions: 2.4.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-1917.patch, YARN-1917.patch


 Currently, YARN dosen't have this method. Users needs to write 
 implementations like UnmanagedAMLauncher.monitorApplication or 
 mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful 
 to end users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1962) Timeline server is enabled by default

2014-04-18 Thread Mohammad Kamrul Islam (JIRA)

Mohammad Kamrul Islam created YARN-1962:
---

 Summary: Timeline server is enabled by default
 Key: YARN-1962
 URL: https://issues.apache.org/jira/browse/YARN-1962
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam


Since Timeline server is not matured and secured yet, enabling  it by default 
might create some confusion.

We were playing with 2.4.0 and found a lot of exceptions for distributed shell 
example related to connection refused error. Btw, we didn't run TS because it 
is not secured yet.

Although it is possible to explicitly turn it off through yarn-site config. In 
my opinion,  this extra change for this new service is not worthy at this 
point,.  

This JIRA is to turn it off by default.
If there is an agreement, i can put a simple patch about this.

{noformat}
14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response 
from the timeline server.
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
Connection refused
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR 
impl.TimelineClientImpl: Failed to get the response from the timeline server.
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
Connection refused
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
at com.sun.jersey.api.client.Client.handle(Client.java:648)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at 
com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515)
at 
org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at java.net.Socket.connect(Socket.java:528)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at

[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.

2014-04-18 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974690#comment-13974690
 ] 

Karthik Kambatla commented on YARN-1929:


Ping...

 DeadLock in RM when automatic failover is enabled.
 --

 Key: YARN-1929
 URL: https://issues.apache.org/jira/browse/YARN-1929
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
 Environment: Yarn HA cluster
Reporter: Rohith
Assignee: Karthik Kambatla
Priority: Blocker
 Attachments: yarn-1929-1.patch, yarn-1929-2.patch


 Dead lock detected  in RM when automatic failover is enabled.
 {noformat}
 Found one Java-level deadlock:
 =
 Thread-2:
   waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a 
 org.apache.hadoop.ha.ActiveStandbyElector),
   which is held by main-EventThread
 main-EventThread:
   waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService),
   which is held by Thread-2
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart