[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1954: - Attachment: YARN-1954.1.patch Added waitFor() API to AMRMClientAsync(). Add waitFor to AMRMClient(Async) Key: YARN-1954 URL: https://issues.apache.org/jira/browse/YARN-1954 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-1954.1.patch Recently, I saw some use cases of AMRMClient(Async). The painful thing is that the main non-daemon thread has to sit in a dummy loop to prevent AM process exiting before all the tasks are done, while unregistration is triggered on a separate another daemon thread by callback methods (in particular when using AMRMClientAsync). IMHO, it should be beneficial to add a waitFor method to AMRMClient(Async) to block the AM until unregistration or user supplied check point, such that users don't need to write the loop themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1954: - Attachment: YARN-1954.2.patch Deleted needless test from v1. Add waitFor to AMRMClient(Async) Key: YARN-1954 URL: https://issues.apache.org/jira/browse/YARN-1954 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-1954.1.patch, YARN-1954.2.patch Recently, I saw some use cases of AMRMClient(Async). The painful thing is that the main non-daemon thread has to sit in a dummy loop to prevent AM process exiting before all the tasks are done, while unregistration is triggered on a separate another daemon thread by callback methods (in particular when using AMRMClientAsync). IMHO, it should be beneficial to add a waitFor method to AMRMClient(Async) to block the AM until unregistration or user supplied check point, such that users don't need to write the loop themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973839#comment-13973839 ] Tsuyoshi OZAWA commented on YARN-1954: -- [~zjshen], I added waitFor() method which takes SupplerBoolean based on Zhijie's idea. I appreciate if you can take a look. Add waitFor to AMRMClient(Async) Key: YARN-1954 URL: https://issues.apache.org/jira/browse/YARN-1954 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-1954.1.patch, YARN-1954.2.patch Recently, I saw some use cases of AMRMClient(Async). The painful thing is that the main non-daemon thread has to sit in a dummy loop to prevent AM process exiting before all the tasks are done, while unregistration is triggered on a separate another daemon thread by callback methods (in particular when using AMRMClientAsync). IMHO, it should be beneficial to add a waitFor method to AMRMClient(Async) to block the AM until unregistration or user supplied check point, such that users don't need to write the loop themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973850#comment-13973850 ] Hadoop QA commented on YARN-1954: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640781/YARN-1954.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3594//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3594//console This message is automatically generated. Add waitFor to AMRMClient(Async) Key: YARN-1954 URL: https://issues.apache.org/jira/browse/YARN-1954 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-1954.1.patch, YARN-1954.2.patch Recently, I saw some use cases of AMRMClient(Async). The painful thing is that the main non-daemon thread has to sit in a dummy loop to prevent AM process exiting before all the tasks are done, while unregistration is triggered on a separate another daemon thread by callback methods (in particular when using AMRMClientAsync). IMHO, it should be beneficial to add a waitFor method to AMRMClient(Async) to block the AM until unregistration or user supplied check point, such that users don't need to write the loop themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1959) Fix headroom calculation in Fair Scheduler
Sandy Ryza created YARN-1959: Summary: Fix headroom calculation in Fair Scheduler Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1954) Add waitFor to AMRMClient(Async)
[ https://issues.apache.org/jira/browse/YARN-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973857#comment-13973857 ] Hadoop QA commented on YARN-1954: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640783/YARN-1954.2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3595//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3595//console This message is automatically generated. Add waitFor to AMRMClient(Async) Key: YARN-1954 URL: https://issues.apache.org/jira/browse/YARN-1954 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 3.0.0, 2.4.0 Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-1954.1.patch, YARN-1954.2.patch Recently, I saw some use cases of AMRMClient(Async). The painful thing is that the main non-daemon thread has to sit in a dummy loop to prevent AM process exiting before all the tasks are done, while unregistration is triggered on a separate another daemon thread by callback methods (in particular when using AMRMClientAsync). IMHO, it should be beneficial to add a waitFor method to AMRMClient(Async) to block the AM until unregistration or user supplied check point, such that users don't need to write the loop themselves. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-1959: - Description: The Fair Scheduler currently always sets the headroom to 0. Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973869#comment-13973869 ] Sandy Ryza commented on YARN-1959: -- The headroom for an app should be set to the min(app's queue's max share, cluster capacity) - app's queues resources consumed. Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk
[ https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973879#comment-13973879 ] Tsuyoshi OZAWA commented on YARN-1778: -- I tried but currently I cannot reproduce this problem. [~xgong], should we close this issue once and reopen it when we find to reproduce on trunk? TestFSRMStateStore fails on trunk - Key: YARN-1778 URL: https://issues.apache.org/jira/browse/YARN-1778 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-687) TestNMAuditLogger hang
[ https://issues.apache.org/jira/browse/YARN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved YARN-687. - Resolution: Cannot Reproduce TestNMAuditLogger hang -- Key: YARN-687 URL: https://issues.apache.org/jira/browse/YARN-687 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 3.0.0 Environment: Linux stevel-dev 3.2.0-24-virtual #39-Ubuntu SMP Mon May 21 18:44:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_27 OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) Reporter: Steve Loughran Priority: Minor TestNMAuditLogger hanging repeatedly on a test VM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-687) TestNMAuditLogger hang
[ https://issues.apache.org/jira/browse/YARN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973898#comment-13973898 ] Steve Loughran commented on YARN-687: - been a long time -I don't have that VM around. I'll close as a cannot-reproduce unless/until it surfaces again TestNMAuditLogger hang -- Key: YARN-687 URL: https://issues.apache.org/jira/browse/YARN-687 Project: Hadoop YARN Issue Type: Test Components: nodemanager Affects Versions: 3.0.0 Environment: Linux stevel-dev 3.2.0-24-virtual #39-Ubuntu SMP Mon May 21 18:44:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux java version 1.6.0_27 OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) Reporter: Steve Loughran Priority: Minor TestNMAuditLogger hanging repeatedly on a test VM -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort
[ https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973901#comment-13973901 ] Steve Loughran commented on YARN-1946: -- Thomas -we proxy the GUI, but for the REST API we don't. The proxy doesn't forward any operation than GET, and there's no guarantee clients will handle 307 redirects with the same HTTP verb anyway. If/when the proxy supports more operations, we can try with it. the filter just says ws/* - no proxy else: proxy and ws/ doesn't do GUI need Public interface for WebAppUtils.getProxyHostAndPort - Key: YARN-1946 URL: https://issues.apache.org/jira/browse/YARN-1946 Project: Hadoop YARN Issue Type: Sub-task Components: api, webapp Affects Versions: 2.4.0 Reporter: Thomas Graves Priority: Critical ApplicationMasters are supposed to go through the ResourceManager web app proxy if they have web UI's so they are properly secured. There is currently no public interface for Application Masters to conveniently get the proxy host and port. There is a function in WebAppUtils, but that class is private. We should provide this as a utility since any properly written AM will need to do this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973973#comment-13973973 ] Hudson commented on YARN-1281: -- FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/544/]) YARN-1281. Fixed TestZKRMStateStoreZKClientConnections to not fail intermittently due to ZK-client timeouts. Contributed by Tsuyoshi Ozawa. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588369) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Fix For: 2.4.1 Attachments: YARN-1281.1.patch, YARN-1281.2.patch, output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1931) Private API change in YARN-1824 in 2.4 broke compatibility with previous releases
[ https://issues.apache.org/jira/browse/YARN-1931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973979#comment-13973979 ] Hudson commented on YARN-1931: -- FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/544/]) YARN-1931. Private API change in YARN-1824 in 2.4 broke compatibility with previous releases (Sandy Ryza via tgraves) (tgraves: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588281) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java Private API change in YARN-1824 in 2.4 broke compatibility with previous releases - Key: YARN-1931 URL: https://issues.apache.org/jira/browse/YARN-1931 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.4.0 Reporter: Thomas Graves Assignee: Sandy Ryza Priority: Blocker Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: YARN-1931-1.patch, YARN-1931-2.patch, YARN-1931.patch YARN-1824 broke compatibility with previous 2.x releases by changes the API's in org.apache.hadoop.yarn.util.Apps.{setEnvFromInputString,addToEnvironment} The old api should be added back in. This affects any ApplicationMasters who were using this api. It also breaks previously built MapReduce libraries from working with the new Yarn release as MR uses this api. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster
[ https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973974#comment-13973974 ] Hudson commented on YARN-1824: -- FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/544/]) YARN-1931. Private API change in YARN-1824 in 2.4 broke compatibility with previous releases (Sandy Ryza via tgraves) (tgraves: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588281) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java Make Windows client work with Linux/Unix cluster Key: YARN-1824 URL: https://issues.apache.org/jira/browse/YARN-1824 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jian He Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1824.1.patch, YARN-1824.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1750) TestNodeStatusUpdater#testNMRegistration is incorrect in test case
[ https://issues.apache.org/jira/browse/YARN-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973978#comment-13973978 ] Hudson commented on YARN-1750: -- FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/544/]) YARN-1750. TestNodeStatusUpdater#testNMRegistration is incorrect in test case. (Wangda Tan via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588343) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java TestNodeStatusUpdater#testNMRegistration is incorrect in test case -- Key: YARN-1750 URL: https://issues.apache.org/jira/browse/YARN-1750 Project: Hadoop YARN Issue Type: Test Components: nodemanager Reporter: Ming Ma Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-1750.patch This test case passes. However, the test output log has java.lang.AssertionError: Number of applications should only be one! expected:1 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker.nodeHeartbeat(TestNodeStatusUpdater.java:267) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:469) at java.lang.Thread.run(Thread.java:695) TestNodeStatusUpdater.java has invalid asserts. } else if (heartBeatID == 3) { // Checks on the RM end Assert.assertEquals(Number of applications should only be one!, 1, appToContainers.size()); Assert.assertEquals(Number of container for the app should be two!, 2, appToContainers.get(appId2).size()); We should fix the assert and add more check to the test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1870) FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo()
[ https://issues.apache.org/jira/browse/YARN-1870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973977#comment-13973977 ] Hudson commented on YARN-1870: -- FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/544/]) YARN-1870. FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo. (Fengdong Yu via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588324) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java FileInputStream is not closed in ProcfsBasedProcessTree#constructProcessSMAPInfo() -- Key: YARN-1870 URL: https://issues.apache.org/jira/browse/YARN-1870 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Ted Yu Assignee: Fengdong Yu Priority: Minor Fix For: 2.5.0 Attachments: YARN-1870.patch {code} ListString lines = IOUtils.readLines(new FileInputStream(file)); {code} FileInputStream is not closed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1947) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973975#comment-13973975 ] Hudson commented on YARN-1947: -- FAILURE: Integrated in Hadoop-Yarn-trunk #544 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/544/]) YARN-1947. TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently. (Jian He via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588365) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently - Key: YARN-1947 URL: https://issues.apache.org/jira/browse/YARN-1947 Project: Hadoop YARN Issue Type: Test Reporter: Jian He Assignee: Jian He Fix For: 2.4.1 Attachments: YARN-1947.1.patch, YARN-1947.2.patch java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1960) LocalFSFileInputStream should support mark()
Daniel Darabos created YARN-1960: Summary: LocalFSFileInputStream should support mark() Key: YARN-1960 URL: https://issues.apache.org/jira/browse/YARN-1960 Project: Hadoop YARN Issue Type: New Feature Components: api Reporter: Daniel Darabos Priority: Minor This is easily done by wrapping the FileInputStream in a BufferedInputStream. I wish for this feature because Apache Commons Compress's CompressorStreamFactory relies on it. There is benefit to being able to open local compressed file during testing. I'll send a patch for this if it's okay. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-1960) LocalFSFileInputStream should support mark()
[ https://issues.apache.org/jira/browse/YARN-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Darabos resolved YARN-1960. -- Resolution: Not a Problem Duh, I should just use BufferedFSInputStream. Sorry. LocalFSFileInputStream should support mark() Key: YARN-1960 URL: https://issues.apache.org/jira/browse/YARN-1960 Project: Hadoop YARN Issue Type: New Feature Components: api Reporter: Daniel Darabos Priority: Minor This is easily done by wrapping the FileInputStream in a BufferedInputStream. I wish for this feature because Apache Commons Compress's CompressorStreamFactory relies on it. There is benefit to being able to open local compressed file during testing. I'll send a patch for this if it's okay. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (YARN-1943) Multitenant LinuxContainerExecutor is incompatible with Simple Security mode.
[ https://issues.apache.org/jira/browse/YARN-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay vyas reopened YARN-1943: Reopening, based on the following use case: 1) Alice and Tom trust each other. 2) They run their jobs on the same cluster. 3) Neither would ever knowingly do anything to harm the other (i.e. impersonate user and then write code in a M/R job to scrape ssh keys from local fs.) 4) But tom is a novice developer, and MIGHT do something funny like accidentally overwrite files in /user/alice/ in some of his jobs, so SOME process isolation would be nice to have. 5) And also : alice and tom are using a posix style HCFS where uid is important in order to do operations like chown. So in the above scenario, there really is not much need for kerberization: its a simple and lightweight cluster with trusted users , but there is alot of value in having some basic process isolation nevertheless...i.e. from linux containers. SUGGESTION: Rather than add an extra parameter, we just can allow a wildcard parameter in the nonsecure.local-user parameter value : {noformat} + nameyarn.nodemanager.linux-container-executor.nonsecure-mode.local-user/name +value*/value {noformat} That anyone who is submitting a job, is the user that the LCE will run under. Essentially, this provides administrators the option of disabling/enabling the feature added in YARN-1253. Multitenant LinuxContainerExecutor is incompatible with Simple Security mode. - Key: YARN-1943 URL: https://issues.apache.org/jira/browse/YARN-1943 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.3.0 Reporter: jay vyas Priority: Critical Labels: linux Fix For: 2.3.0 As of hadoop 2.3.0, commit cc74a18c makes it so that nonsecureLocalUser replaces the user who submits a job if security is disabled: {noformat} return UserGroupInformation.isSecurityEnabled() ? user : nonsecureLocalUser; {noformat} However, the only way to enable security, is to NOT use SIMPLE authentication mode: {noformat} public static boolean isSecurityEnabled() { return !isAuthenticationMethodEnabled(AuthenticationMethod.SIMPLE); } {noformat} Thus, the framework ENFORCES that SIMPLE login security -- nonSecureuser for submission of LinuxExecutorContainer. This results in a confusing issue, wherein we submit a job as sally and then get an exception that user nobody is not whitelisted and has UID MAX_ID. My proposed solution is that we should be able to leverage LinuxContainerExector regardless of hadoop's view of the security settings on the cluster, i.e. decouple LinuxContainerExecutor logic from the isSecurityEnabled return value. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1824) Make Windows client work with Linux/Unix cluster
[ https://issues.apache.org/jira/browse/YARN-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974071#comment-13974071 ] Hudson commented on YARN-1824: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1736 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/]) YARN-1931. Private API change in YARN-1824 in 2.4 broke compatibility with previous releases (Sandy Ryza via tgraves) (tgraves: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588281) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java Make Windows client work with Linux/Unix cluster Key: YARN-1824 URL: https://issues.apache.org/jira/browse/YARN-1824 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Jian He Assignee: Jian He Fix For: 2.4.0 Attachments: YARN-1824.1.patch, YARN-1824.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1947) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974072#comment-13974072 ] Hudson commented on YARN-1947: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1736 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1736/]) YARN-1947. TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently. (Jian He via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588365) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently - Key: YARN-1947 URL: https://issues.apache.org/jira/browse/YARN-1947 Project: Hadoop YARN Issue Type: Test Reporter: Jian He Assignee: Jian He Fix For: 2.4.1 Attachments: YARN-1947.1.patch, YARN-1947.2.patch java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1947) TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently
[ https://issues.apache.org/jira/browse/YARN-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974146#comment-13974146 ] Hudson commented on YARN-1947: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1761 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1761/]) YARN-1947. TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently. (Jian He via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588365) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/MemoryRMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestRMDelegationTokens.java TestRMDelegationTokens#testRMDTMasterKeyStateOnRollingMasterKey is failing intermittently - Key: YARN-1947 URL: https://issues.apache.org/jira/browse/YARN-1947 Project: Hadoop YARN Issue Type: Test Reporter: Jian He Assignee: Jian He Fix For: 2.4.1 Attachments: YARN-1947.1.patch, YARN-1947.2.patch java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens.testRMDTMasterKeyStateOnRollingMasterKey(TestRMDelegationTokens.java:117) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974144#comment-13974144 ] Hudson commented on YARN-1281: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1761 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1761/]) YARN-1281. Fixed TestZKRMStateStoreZKClientConnections to not fail intermittently due to ZK-client timeouts. Contributed by Tsuyoshi Ozawa. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588369) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStoreZKClientConnections.java TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Fix For: 2.4.1 Attachments: YARN-1281.1.patch, YARN-1281.2.patch, output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1750) TestNodeStatusUpdater#testNMRegistration is incorrect in test case
[ https://issues.apache.org/jira/browse/YARN-1750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974149#comment-13974149 ] Hudson commented on YARN-1750: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1761 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1761/]) YARN-1750. TestNodeStatusUpdater#testNMRegistration is incorrect in test case. (Wangda Tan via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588343) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java TestNodeStatusUpdater#testNMRegistration is incorrect in test case -- Key: YARN-1750 URL: https://issues.apache.org/jira/browse/YARN-1750 Project: Hadoop YARN Issue Type: Test Components: nodemanager Reporter: Ming Ma Assignee: Wangda Tan Fix For: 2.4.1 Attachments: YARN-1750.patch This test case passes. However, the test output log has java.lang.AssertionError: Number of applications should only be one! expected:1 but was:2 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker.nodeHeartbeat(TestNodeStatusUpdater.java:267) at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:469) at java.lang.Thread.run(Thread.java:695) TestNodeStatusUpdater.java has invalid asserts. } else if (heartBeatID == 3) { // Checks on the RM end Assert.assertEquals(Number of applications should only be one!, 1, appToContainers.size()); Assert.assertEquals(Number of container for the app should be two!, 2, appToContainers.get(appId2).size()); We should fix the assert and add more check to the test. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient
[ https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974158#comment-13974158 ] Tsuyoshi OZAWA commented on YARN-1917: -- Compilation error is in ResourceMgrDelegate.java. [~leftnoteasy], can you check it? Add waitForApplicationState interface to YarnClient - Key: YARN-1917 URL: https://issues.apache.org/jira/browse/YARN-1917 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-1917.patch Currently, YARN dosen't have this method. Users needs to write implementations like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful to end users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient
[ https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974163#comment-13974163 ] Wangda Tan commented on YARN-1917: -- Thanks [~ozawa] for this reminding :), I'll check it later. Add waitForApplicationState interface to YarnClient - Key: YARN-1917 URL: https://issues.apache.org/jira/browse/YARN-1917 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-1917.patch Currently, YARN dosen't have this method. Users needs to write implementations like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful to end users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974194#comment-13974194 ] Tsuyoshi OZAWA commented on YARN-1798: -- I could reproduce the failure of TestContainerLaunch: {quote} Running org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch Tests run: 10, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 57.103 sec FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 25.302 sec FAILURE! java.lang.AssertionError: ContainerState is not correct (timedout) expected:COMPLETE but was:RUNNING at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:276) at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:254) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.internalKillTest(TestContainerLaunch.java:704) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:748) testImmediateKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 25.087 sec FAILURE! java.lang.AssertionError: ContainerState is not correct (timedout) expected:COMPLETE but was:RUNNING at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:276) at org.apache.hadoop.yarn.server.nodemanager.containermanager.BaseContainerManagerTest.waitForContainerState(BaseContainerManagerTest.java:254) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.internalKillTest(TestContainerLaunch.java:704) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testImmediateKill(TestContainerLaunch.java:753) testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch) Time elapsed: 5.058 sec FAILURE! java.lang.AssertionError: Process is not alive! at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:582) Running org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.69 sec FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor testContainerKillOnMemoryOverflow(org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor) Time elapsed: 18.261 sec FAILURE! java.lang.AssertionError: expected:143 but was:0 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor.testContainerKillOnMemoryOverflow(TestContainersMonitor.java:273) Running org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.731 sec FAILURE! - in org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown testKillContainersOnShutdown(org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown) Time elapsed: 6.404 sec FAILURE! java.lang.AssertionError: Did not find sigterm message at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown.testKillContainersOnShutdown(TestNodeManagerShutdown.java:153) {quote} TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL:
[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974197#comment-13974197 ] Tsuyoshi OZAWA commented on YARN-1798: -- output log is as follows: {quote} org.apache.hadoop.yarn.exceptions.YarnException: Unable to get local resources when Container container_1397835546010_0001_01_01 is at null at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:173) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testCallFailureWithNullLocalizedResources(TestContainerLaunch.java:779) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {quote} TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1798: - Attachment: TestContainerLaunch-output.txt Attached output log at the test tailure. TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974207#comment-13974207 ] Tsuyoshi OZAWA commented on YARN-1798: -- Sorry, the output log I mentioned in comment looks not related to test failure - it works well in test. The test failure looks just assertion failure because of timing issue. TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1798: - Attachment: TestContainerLaunch.txt TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-1798: Assignee: Tsuyoshi OZAWA TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1798: - Attachment: YARN-1798.1.patch The test failure in TestContainerLaunch is caused by the timeout of waitForContainerState(). This patch extends the timeout value. TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, YARN-1798.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1932) Javascript injection on the job status page
[ https://issues.apache.org/jira/browse/YARN-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974227#comment-13974227 ] Jason Lowe commented on YARN-1932: -- +1 lgtm. I'll commit this later today unless there are any objections. Javascript injection on the job status page --- Key: YARN-1932 URL: https://issues.apache.org/jira/browse/YARN-1932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.9, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Priority: Blocker Attachments: YARN-1932.patch, YARN-1932.patch Scripts can be injected into the job status page as the diagnostics field is not sanitized. Whatever string you set there will show up to the jobs page as it is ... ie. if you put any script commands, they will be executed in the browser of the user who is opening the page. We need escaping the diagnostic string in order to not run the scripts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974255#comment-13974255 ] Tsuyoshi OZAWA commented on YARN-1798: -- Confirmed attached patch is not enough. I'll look deeply. TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, YARN-1798.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1798) TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux
[ https://issues.apache.org/jira/browse/YARN-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974264#comment-13974264 ] Hadoop QA commented on YARN-1798: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640845/YARN-1798.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3596//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3596//console This message is automatically generated. TestContainerLaunch, TestContainersMonitor, TestNodeManagerShutdown, TestNodeStatusUpdater fails on Linux - Key: YARN-1798 URL: https://issues.apache.org/jira/browse/YARN-1798 Project: Hadoop YARN Issue Type: Test Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: TestContainerLaunch-output.txt, TestContainerLaunch.txt, YARN-1798.1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1940) deleteAsUser() terminates early without deleting more files on error
[ https://issues.apache.org/jira/browse/YARN-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974336#comment-13974336 ] Jason Lowe commented on YARN-1940: -- +1 lgtm, committing this deleteAsUser() terminates early without deleting more files on error Key: YARN-1940 URL: https://issues.apache.org/jira/browse/YARN-1940 Project: Hadoop YARN Issue Type: Bug Reporter: Kihwal Lee Assignee: Rushabh S Shah Attachments: YARN-1940-v2.patch, YARN-1940.patch In container-executor.c, delete_path() returns early when unlink() against a file or a symlink fails. We have seen many cases of the error being ENOENT, which can safely be ignored during delete. This is what we saw recently: An app mistakenly created a large number of files in the local directory and the deletion service failed to delete a significant portion of them due to this bug. Repeatedly hitting this on the same node led to exhaustion of inodes in one of the partitions. Beside ignoring ENOENT, delete_path() can simply skip the failed one and continue in some cases, rather than aborting and leaving files behind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues
Ashwin Shankar created YARN-1961: Summary: Fair scheduler preemption doesn't work for non-leaf queues Key: YARN-1961 URL: https://issues.apache.org/jira/browse/YARN-1961 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0 Reporter: Ashwin Shankar Setting minResources and minSharePreemptionTimeout to a non-leaf queue doesn't cause preemption to happen when that non-leaf queue is below minResources and there are outstanding demands in that non-leaf queue. Here is an example fs allocation config(partial) : {code:xml} queue name=abc minResources3072 mb,0 vcores/minResources minSharePreemptionTimeout30/minSharePreemptionTimeout queue name=childabc1 /queue queue name=childabc2 /queue /queue {code} With the above configs,preemption doesn't seem to happen if queue abc is below minShare and it has outstanding unsatisfied demands from apps in its child queues. Ideally in such cases we would like preemption to kick off and reclaim resources from other queues(not under queue abc). Looking at the code it seems like preemption checks for starvation only at the leaf queue level and not at the parent level. {code:title=FairScheduler.java|borderStyle=solid} boolean isStarvedForMinShare(FSLeafQueue sched) boolean isStarvedForFairShare(FSLeafQueue sched) {code} This affects our use case where we have a parent queue with probably a 100 unconfigured leaf queues under it.We want to give a minshare to the parent queue to protect all the leaf queues under it,but we cannot do it due to this bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1940) deleteAsUser() terminates early without deleting more files on error
[ https://issues.apache.org/jira/browse/YARN-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974423#comment-13974423 ] Hudson commented on YARN-1940: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5537 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5537/]) YARN-1940. deleteAsUser() terminates early without deleting more files on error. Contributed by Rushabh S Shah (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588546) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c deleteAsUser() terminates early without deleting more files on error Key: YARN-1940 URL: https://issues.apache.org/jira/browse/YARN-1940 Project: Hadoop YARN Issue Type: Bug Reporter: Kihwal Lee Assignee: Rushabh S Shah Fix For: 3.0.0, 2.5.0 Attachments: YARN-1940-v2.patch, YARN-1940.patch In container-executor.c, delete_path() returns early when unlink() against a file or a symlink fails. We have seen many cases of the error being ENOENT, which can safely be ignored during delete. This is what we saw recently: An app mistakenly created a large number of files in the local directory and the deletion service failed to delete a significant portion of them due to this bug. Repeatedly hitting this on the same node led to exhaustion of inodes in one of the partitions. Beside ignoring ENOENT, delete_path() can simply skip the failed one and continue in some cases, rather than aborting and leaving files behind. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-766) TestNodeManagerShutdown should use Shell to form the output path
[ https://issues.apache.org/jira/browse/YARN-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974450#comment-13974450 ] Siddharth Seth commented on YARN-766: - [~djp], The 2.x patch is only required to fix a difference in formatting between trunk and branch-2. Up to you on whether to fix the trunk formatting in this jira or whenever the code is touched next. TestNodeManagerShutdown should use Shell to form the output path Key: YARN-766 URL: https://issues.apache.org/jira/browse/YARN-766 Project: Hadoop YARN Issue Type: Test Affects Versions: 2.1.0-beta Reporter: Siddharth Seth Assignee: Siddharth Seth Priority: Minor Attachments: YARN-766.branch-2.txt, YARN-766.trunk.txt, YARN-766.txt File scriptFile = new File(tmpDir, scriptFile.sh); should be replaced with File scriptFile = Shell.appendScriptExtension(tmpDir, scriptFile); to match trunk. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974456#comment-13974456 ] Sandy Ryza commented on YARN-1959: -- One thing I don't understand from reading the Capacity Scheduler headroom calculation is how it prevents apps from starving when a max capacity isn't set. It's defined as min((userLimit, queue-max-cap) - consumed). If no max capacities are set and two users are running in a queue, each taking up half the queue's capacity, the headroom for each user will be half the queue's capacity. If the cluster is saturated to the extent that the queue's usage can't go above its capacity, the headroom is being vastly overreported. [~jlowe], any insight on this? Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974467#comment-13974467 ] Jason Lowe commented on YARN-1959: -- Yes, over-reporting of the headroom in the CapacityScheduler is a known issue. See YARN-1857. I think the calculation for the CapacityScheduler should be more like min((userLimit-userConsumed), (queueMax-queueConsumed)). The idea being that one can't go over the user limit but also can't go over what the queue has free either. Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1864) Fair Scheduler Dynamic Hierarchical User Queues
[ https://issues.apache.org/jira/browse/YARN-1864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974497#comment-13974497 ] Sandy Ryza commented on YARN-1864: -- Thanks for working on this Ashwin. I like the approach. {code} +if (hierarchicalUserQueue.equals(element.getAttribute(name))) { + throw new AllocationConfigurationException( + hierarchicalUserQueue cannot be a nested rule); +} {code} Any reason why it can't? {code} +// Verify if the queue returned by the nested rule is an existing leaf queue, +// if yes then skip to next rule in the queue placement policy +if (queueMgr.exists(queueName) + (queueMgr.getQueue(queueName) instanceof FSLeafQueue)) { + return ; +} {code} The QueuePlacementPolicy isn't in responsible for verifying this. We expect to hit an error later on if the queue returned by the QueuePlacementPolicy isn't a leaf queue. This can happen with other rules, for example with the specified rule if someone tries to explicitly submit to a parent queue. Given this, hierarchical queue rule should be terminal if its nested rule is terminal, right? {code} +if (create +|| (!isNestedRule + configuredQueues.get(FSQueueType.LEAF).contains(queue) +|| (isNestedRule configuredQueues +.get(FSQueueType.PARENT).contains(queue { {code} Along similar lines to the previous comment, I don't think we need this logic, or isNestedQueue and FSQueueType, in the queue placement policy. If the rule places an app in root.engineering.ashwin and root.engineering happens to be a leaf or root.ashwin happens to be a parent, we'll end up throwing an error later, which is about the best we can do. Let me know if there are cases I'm missing or if I'm not thinking this through deeply enough. {code} + * Places all apps in the specified default queue. + * If no default queue is specified or if the specified default queue + * doesn't exist, the app is placed in root.default queue {code} This has gotten to be a fairly big patch - mind making these changed to the default rule in a separate JIRA? The name HierarchicalUserQueue seems a kind of weird to me, as the meaning of hierarchical in it is a little vague or maybe already overloaded. Maybe UserQueueUnderneath, UserQueueInside, or UserQueueBelow? Any other ideas? I've got a few stylistic comments in addition to these, but wanted to get this stuff worked out first. Fair Scheduler Dynamic Hierarchical User Queues --- Key: YARN-1864 URL: https://issues.apache.org/jira/browse/YARN-1864 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Ashwin Shankar Labels: scheduler Attachments: YARN-1864-v1.txt, YARN-1864-v2.txt In Fair Scheduler, we want to be able to create user queues under any parent queue in the hierarchy. For eg. Say user1 submits a job to a parent queue called root.allUserQueues, we want be able to create a new queue called root.allUserQueues.user1 and run user1's job in it.Any further jobs submitted by this user to root.allUserQueues will be run in this newly created root.allUserQueues.user1. This is very similar to the 'user-as-default' feature in Fair Scheduler which creates user queues under root queue. But we want the ability to create user queues under ANY parent queue. Why do we want this ? 1. Preemption : these dynamically created user queues can preempt each other if its fair share is not met. So there is fairness among users. User queues can also preempt other non-user leaf queue as well if below fair share. 2. Allocation to user queues : we want all the user queries(adhoc) to consume only a fraction of resources in the shared cluster. By creating this feature,we could do that by giving a fair share to the parent user queue which is then redistributed to all the dynamically created user queues. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1932) Javascript injection on the job status page
[ https://issues.apache.org/jira/browse/YARN-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974528#comment-13974528 ] Hudson commented on YARN-1932: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5539 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5539/]) YARN-1932. Javascript injection on the job status page. Contributed by Mit Desai (jlowe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1588572) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/InfoBlock.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/view/TestInfoBlock.java Javascript injection on the job status page --- Key: YARN-1932 URL: https://issues.apache.org/jira/browse/YARN-1932 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 0.23.9, 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Priority: Blocker Attachments: YARN-1932.patch, YARN-1932.patch Scripts can be injected into the job status page as the diagnostics field is not sanitized. Whatever string you set there will show up to the jobs page as it is ... ie. if you put any script commands, they will be executed in the browser of the user who is opening the page. We need escaping the diagnostic string in order to not run the scripts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974541#comment-13974541 ] Sandy Ryza commented on YARN-1959: -- Ah, ok, thanks Jason. With your formula, assuming no user limits, what happens if queueMax is 100%? All queue maxes are 100% by default in the Capacity Scheduler, right? If there are two queues both with max 100%, and both using 50% of resources, they would both end up with 50% headroom. Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974550#comment-13974550 ] Jason Lowe commented on YARN-1959: -- Good point, it would also need to min against the available cluster resources to cover the case of cross-queue contention. Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned YARN-1959: Assignee: Sandy Ryza Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974557#comment-13974557 ] Sandy Ryza commented on YARN-1959: -- Cool, wanted to make sure I understood how it worked. In that case, I think the best choice for the Fair Scheduler would probably be min(cluster capacity - cluster consumed, queue max share - queue consumed). Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1865) ShellScriptBuilder does not check for some error conditions
[ https://issues.apache.org/jira/browse/YARN-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Mitic updated YARN-1865: - Attachment: YARN-1865.4.patch Addressing my latest comment to add a test case for the non-zero exit code. Will commit this patch assuming it receives +1 from Jenkins. ShellScriptBuilder does not check for some error conditions --- Key: YARN-1865 URL: https://issues.apache.org/jira/browse/YARN-1865 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.2.0, 2.3.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Attachments: YARN-1865.1.patch, YARN-1865.2.patch, YARN-1865.3.patch, YARN-1865.4.patch The WindowsShellScriptBuilder does not check for commands exceeding windows maximum shell command line length (8191 chars) Neither Unix nor Windows script builder do not check for error condition after mkdir nor link WindowsShellScriptBuilder mkdir is not safe with regard to paths containing spaces -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1917) Add waitForApplicationState interface to YarnClient
[ https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-1917: - Attachment: YARN-1917.patch uploaded patch resolved build failure Add waitForApplicationState interface to YarnClient - Key: YARN-1917 URL: https://issues.apache.org/jira/browse/YARN-1917 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-1917.patch, YARN-1917.patch Currently, YARN dosen't have this method. Users needs to write implementations like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful to end users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1865) ShellScriptBuilder does not check for some error conditions
[ https://issues.apache.org/jira/browse/YARN-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974661#comment-13974661 ] Hadoop QA commented on YARN-1865: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640911/YARN-1865.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3597//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3597//console This message is automatically generated. ShellScriptBuilder does not check for some error conditions --- Key: YARN-1865 URL: https://issues.apache.org/jira/browse/YARN-1865 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0, 2.2.0, 2.3.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Minor Attachments: YARN-1865.1.patch, YARN-1865.2.patch, YARN-1865.3.patch, YARN-1865.4.patch The WindowsShellScriptBuilder does not check for commands exceeding windows maximum shell command line length (8191 chars) Neither Unix nor Windows script builder do not check for error condition after mkdir nor link WindowsShellScriptBuilder mkdir is not safe with regard to paths containing spaces -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1959) Fix headroom calculation in Fair Scheduler
[ https://issues.apache.org/jira/browse/YARN-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-1959: --- Assignee: Anubhav Dhoot (was: Sandy Ryza) Fix headroom calculation in Fair Scheduler -- Key: YARN-1959 URL: https://issues.apache.org/jira/browse/YARN-1959 Project: Hadoop YARN Issue Type: Bug Reporter: Sandy Ryza Assignee: Anubhav Dhoot The Fair Scheduler currently always sets the headroom to 0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-556: --- Attachment: WorkPreservingRestartPrototype.001.patch This prototype is a way to understand the overall design and the major issues that need to be addressed and minor details that crop up. This is not a substitute to actual code/unit test for each sub task. Hopefully this will help a discussion on the approach for overall approach and each sub task. In this prototype, the following changes are demonstrated. 1. Containers that were running when RM restarted, will continue running 2. NM on resync sends the list of running containers as ContainerReport so they provide container capability (sizes). 3. AM on resync reregisters instead of shutting down. AM can make further requests after RM restart and they are accepted. 4. Sample of scheduler changes in FairScheduler. It reregisters the application attempt on recovery. On NM addNode it adds the containers to that applicationAttempt and charges these correctly to the application attempt for tracking usage. 5. Application and Containers resume their lifecycle with additional transitions to support continuation after recovery. 6. clustertimestamp is added to containerId so that containerId after RM restart do not clash with containerId before (as the containerId counter resets to zero in memory) 7. Changes are controlled by flag. Not addressed topics 1. Key and token changes 2. AM does not resend requests sent before restart yet. So if the RM restarts after AM has made its request and before RM returns a container, AM is left waiting for allocation. Only new asks made after RM restart work. 3. Completed container status as per design is not handled yet. Readme for running through the prototype a) Setup with RM recovery turned on and scheduler set to FairScheduler b) Start sleep job with map and reduce such as bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar sleep -mt 12000 -rt 12000 c) Restart RM (yarn-daemon.sh stop/start resourcemanager) and see that containers are not restarted. Following 2 scenarios work 1. restart rm while reduce is running. reduce continues and then application completes successfully. Demonstrates continuation of running containers without restart. 2. restart rm while map is running. map continues and then reduce executes and then application completes successfully. Demonstrates requesting more resources after restart works in addition to the previous scenario. RM Restart phase 2 - Work preserving restart Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1917) Add waitForApplicationState interface to YarnClient
[ https://issues.apache.org/jira/browse/YARN-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974678#comment-13974678 ] Hadoop QA commented on YARN-1917: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640914/YARN-1917.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3598//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3598//console This message is automatically generated. Add waitForApplicationState interface to YarnClient - Key: YARN-1917 URL: https://issues.apache.org/jira/browse/YARN-1917 Project: Hadoop YARN Issue Type: New Feature Components: client Affects Versions: 2.4.0 Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-1917.patch, YARN-1917.patch Currently, YARN dosen't have this method. Users needs to write implementations like UnmanagedAMLauncher.monitorApplication or mapreduce.Job.monitorAndPrintJob on their own. This feature should be helpful to end users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-1962) Timeline server is enabled by default
Mohammad Kamrul Islam created YARN-1962: --- Summary: Timeline server is enabled by default Key: YARN-1962 URL: https://issues.apache.org/jira/browse/YARN-1962 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Since Timeline server is not matured and secured yet, enabling it by default might create some confusion. We were playing with 2.4.0 and found a lot of exceptions for distributed shell example related to connection refused error. Btw, we didn't run TS because it is not secured yet. Although it is possible to explicitly turn it off through yarn-site config. In my opinion, this extra change for this new service is not worthy at this point,. This JIRA is to turn it off by default. If there is an agreement, i can put a simple patch about this. {noformat} 14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.in14/04/17 23:24:33 ERROR impl.TimelineClientImpl: Failed to get the response from the timeline server. com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: Connection refused at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648) at com.sun.jersey.api.client.WebResource.handle(WebResource.java:670) at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74) at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:563) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.doPostingEntities(TimelineClientImpl.java:131) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:104) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.publishApplicationAttemptEvent(ApplicationMaster.java:1072) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:515) at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:281) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:198) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at
[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.
[ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974690#comment-13974690 ] Karthik Kambatla commented on YARN-1929: Ping... DeadLock in RM when automatic failover is enabled. -- Key: YARN-1929 URL: https://issues.apache.org/jira/browse/YARN-1929 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Environment: Yarn HA cluster Reporter: Rohith Assignee: Karthik Kambatla Priority: Blocker Attachments: yarn-1929-1.patch, yarn-1929-2.patch Dead lock detected in RM when automatic failover is enabled. {noformat} Found one Java-level deadlock: = Thread-2: waiting to lock monitor 0x7fb514303cf0 (object 0xef153fd0, a org.apache.hadoop.ha.ActiveStandbyElector), which is held by main-EventThread main-EventThread: waiting to lock monitor 0x7fb514750a48 (object 0xef154020, a org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService), which is held by Thread-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974724#comment-13974724 ] Tsuyoshi OZAWA commented on YARN-556: - Anubhav, Thank you for sharing the prototype. I will try it this weekend. RM Restart phase 2 - Work preserving restart Key: YARN-556 URL: https://issues.apache.org/jira/browse/YARN-556 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Bikas Saha Assignee: Bikas Saha Attachments: Work Preserving RM Restart.pdf, WorkPreservingRestartPrototype.001.patch YARN-128 covered storing the state needed for the RM to recover critical information. This umbrella jira will track changes needed to recover the running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.2#6252)