[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092524#comment-14092524 ] Varun Saxena commented on YARN-2138: Kindly refer to this link : http://stackoverflow.com/questions/7268732/get-an-applicable-patch-after-doing-svn-remove-and-svn-rename Cleanup notifyDone* methods in RMStateStore --- Key: YARN-2138 URL: https://issues.apache.org/jira/browse/YARN-2138 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.004.patch, YARN-2138.patch The storedException passed into notifyDoneStoringApplication is always null. Similarly for other notifyDone* methods. We can clean up these methods as this control flow path is not used anymore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092538#comment-14092538 ] Hadoop QA commented on YARN-2138: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660936/YARN-2138.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4584//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4584//console This message is automatically generated. Cleanup notifyDone* methods in RMStateStore --- Key: YARN-2138 URL: https://issues.apache.org/jira/browse/YARN-2138 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.004.patch, YARN-2138.patch The storedException passed into notifyDoneStoringApplication is always null. Similarly for other notifyDone* methods. We can clean up these methods as this control flow path is not used anymore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2361) RMAppAttempt state machine entries for KILLED state has duplicate event entries
[ https://issues.apache.org/jira/browse/YARN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092602#comment-14092602 ] Hudson commented on YARN-2361: -- FAILURE: Integrated in Hadoop-Yarn-trunk #641 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/641/]) YARN-2361. RMAppAttempt state machine entries for KILLED state has duplicate event entries. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617190) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java RMAppAttempt state machine entries for KILLED state has duplicate event entries --- Key: YARN-2361 URL: https://issues.apache.org/jira/browse/YARN-2361 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2361.000.patch remove duplicate entries in the EnumSet of event type in RMAppAttempt state machine. The event RMAppAttemptEventType.EXPIRE is duplicated in the following code. {code} EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED, RMAppAttemptEventType.EXPIRE, RMAppAttemptEventType.LAUNCHED, RMAppAttemptEventType.LAUNCH_FAILED, RMAppAttemptEventType.EXPIRE, RMAppAttemptEventType.REGISTERED, RMAppAttemptEventType.CONTAINER_ALLOCATED, RMAppAttemptEventType.UNREGISTERED, RMAppAttemptEventType.KILL, RMAppAttemptEventType.STATUS_UPDATE)) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2337) ResourceManager sets ClientRMService in RMContext multiple times
[ https://issues.apache.org/jira/browse/YARN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092601#comment-14092601 ] Hudson commented on YARN-2337: -- FAILURE: Integrated in Hadoop-Yarn-trunk #641 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/641/]) YARN-2337. ResourceManager sets ClientRMService in RMContext multiple times. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617183) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java ResourceManager sets ClientRMService in RMContext multiple times Key: YARN-2337 URL: https://issues.apache.org/jira/browse/YARN-2337 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: newbie Fix For: 2.6.0 Attachments: YARN-2337.000.patch remove duplication function call (setClientRMService) in resource manage class. rmContext.setClientRMService(clientRM); is duplicate in serviceInit of ResourceManager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2403) TestNodeManagerResync fails occasionally in trunk
[ https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2403: - Attachment: YARN-2403.patch TestNodeManagerResync fails occasionally in trunk - Key: YARN-2403 URL: https://issues.apache.org/jira/browse/YARN-2403 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: YARN-2403.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ : {code} TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146 expected:2 but was:1 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2403) TestNodeManagerResync fails occasionally in trunk
[ https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092615#comment-14092615 ] Junping Du commented on YARN-2403: -- The variable of registrationCount should be protected by volatile in concurrent environment. Will deliver a quick patch to fix it. TestNodeManagerResync fails occasionally in trunk - Key: YARN-2403 URL: https://issues.apache.org/jira/browse/YARN-2403 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: YARN-2403.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ : {code} TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146 expected:2 but was:1 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2403) TestNodeManagerResync fails occasionally in trunk
[ https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092639#comment-14092639 ] Hadoop QA commented on YARN-2403: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660964/YARN-2403.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4585//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4585//console This message is automatically generated. TestNodeManagerResync fails occasionally in trunk - Key: YARN-2403 URL: https://issues.apache.org/jira/browse/YARN-2403 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Priority: Minor Attachments: YARN-2403.patch From https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ : {code} TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146 expected:2 but was:1 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2337) ResourceManager sets ClientRMService in RMContext multiple times
[ https://issues.apache.org/jira/browse/YARN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092699#comment-14092699 ] Hudson commented on YARN-2337: -- ABORTED: Integrated in Hadoop-Hdfs-trunk #1834 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1834/]) YARN-2337. ResourceManager sets ClientRMService in RMContext multiple times. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617183) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java ResourceManager sets ClientRMService in RMContext multiple times Key: YARN-2337 URL: https://issues.apache.org/jira/browse/YARN-2337 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: newbie Fix For: 2.6.0 Attachments: YARN-2337.000.patch remove duplication function call (setClientRMService) in resource manage class. rmContext.setClientRMService(clientRM); is duplicate in serviceInit of ResourceManager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2361) RMAppAttempt state machine entries for KILLED state has duplicate event entries
[ https://issues.apache.org/jira/browse/YARN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092700#comment-14092700 ] Hudson commented on YARN-2361: -- ABORTED: Integrated in Hadoop-Hdfs-trunk #1834 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1834/]) YARN-2361. RMAppAttempt state machine entries for KILLED state has duplicate event entries. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617190) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java RMAppAttempt state machine entries for KILLED state has duplicate event entries --- Key: YARN-2361 URL: https://issues.apache.org/jira/browse/YARN-2361 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2361.000.patch remove duplicate entries in the EnumSet of event type in RMAppAttempt state machine. The event RMAppAttemptEventType.EXPIRE is duplicated in the following code. {code} EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED, RMAppAttemptEventType.EXPIRE, RMAppAttemptEventType.LAUNCHED, RMAppAttemptEventType.LAUNCH_FAILED, RMAppAttemptEventType.EXPIRE, RMAppAttemptEventType.REGISTERED, RMAppAttemptEventType.CONTAINER_ALLOCATED, RMAppAttemptEventType.UNREGISTERED, RMAppAttemptEventType.KILL, RMAppAttemptEventType.STATUS_UPDATE)) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092714#comment-14092714 ] Varun Vasudev commented on YARN-2373: - [~lmccay] none of those should be a blocker. With regards to documentation, I was referring to pages like [this|http://hadoop.apache.org/docs/r2.4.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html]. [~jianhe] can you please commit the patch? WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2361) RMAppAttempt state machine entries for KILLED state has duplicate event entries
[ https://issues.apache.org/jira/browse/YARN-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092843#comment-14092843 ] Hudson commented on YARN-2361: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1860 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1860/]) YARN-2361. RMAppAttempt state machine entries for KILLED state has duplicate event entries. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617190) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java RMAppAttempt state machine entries for KILLED state has duplicate event entries --- Key: YARN-2361 URL: https://issues.apache.org/jira/browse/YARN-2361 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2361.000.patch remove duplicate entries in the EnumSet of event type in RMAppAttempt state machine. The event RMAppAttemptEventType.EXPIRE is duplicated in the following code. {code} EnumSet.of(RMAppAttemptEventType.ATTEMPT_ADDED, RMAppAttemptEventType.EXPIRE, RMAppAttemptEventType.LAUNCHED, RMAppAttemptEventType.LAUNCH_FAILED, RMAppAttemptEventType.EXPIRE, RMAppAttemptEventType.REGISTERED, RMAppAttemptEventType.CONTAINER_ALLOCATED, RMAppAttemptEventType.UNREGISTERED, RMAppAttemptEventType.KILL, RMAppAttemptEventType.STATUS_UPDATE)) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2337) ResourceManager sets ClientRMService in RMContext multiple times
[ https://issues.apache.org/jira/browse/YARN-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092842#comment-14092842 ] Hudson commented on YARN-2337: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1860 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1860/]) YARN-2337. ResourceManager sets ClientRMService in RMContext multiple times. (Zhihai Xu via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617183) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java ResourceManager sets ClientRMService in RMContext multiple times Key: YARN-2337 URL: https://issues.apache.org/jira/browse/YARN-2337 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Trivial Labels: newbie Fix For: 2.6.0 Attachments: YARN-2337.000.patch remove duplication function call (setClientRMService) in resource manage class. rmContext.setClientRMService(clientRM); is duplicate in serviceInit of ResourceManager. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1915: - Attachment: YARN-1915v2.patch Fixed findbug warning. ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-2277: -- Attachment: YARN-2277-v6.patch Addressed test failure with v6 of the patch. Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092916#comment-14092916 ] Junping Du commented on YARN-1337: -- Thanks [~jlowe] for updating the patch. Some trivial issues to fix below, other looks good to me: {code} + public void addCompletedContainer(ContainerId containerId); + {code} Better to add javadoc for new added (or move from private) public method. {code} - private volatile AtomicBoolean shouldLaunchContainer = new AtomicBoolean(false); - private volatile AtomicBoolean completed = new AtomicBoolean(false); + protected volatile AtomicBoolean shouldLaunchContainer = + new AtomicBoolean(false); + protected volatile AtomicBoolean completed = new AtomicBoolean(false); {code} volatile is unncessary as it was using AtomicBoolean already. Recover containers upon nodemanager restart --- Key: YARN-1337 URL: https://issues.apache.org/jira/browse/YARN-1337 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Attachment: YARN-1813.4.patch Refreshed the latest patch. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092949#comment-14092949 ] Hadoop QA commented on YARN-1915: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660996/YARN-1915v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4586//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4586//console This message is automatically generated. ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Critical Attachments: YARN-1915.patch, YARN-1915v2.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092959#comment-14092959 ] Hadoop QA commented on YARN-2277: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661000/YARN-2277-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.ha.TestZKFailoverControllerStress {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4587//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4587//console This message is automatically generated. Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092989#comment-14092989 ] Hadoop QA commented on YARN-1813: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661008/YARN-1813.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4588//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4588//console This message is automatically generated. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093001#comment-14093001 ] chang li commented on YARN-2308: [~wangda] I have updated my patch according to your suggestion. The patch is uploaded. Thanks NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2308: --- Attachment: jira2308.patch updated patch according to Wangda's advice NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093009#comment-14093009 ] Hadoop QA commented on YARN-2308: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661016/jira2308.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4590//console This message is automatically generated. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093027#comment-14093027 ] Jian He commented on YARN-2373: --- First look, this seems to be a bug ? earlier the first parameter is sslConf.get(ssl.server.truststore.location), but now it's changed to passwd, {code} .trustStore(getPassword(sslConf, WEB_APP_TRUSTSTORE_PASSWORD_KEY), sslConf.get(WEB_APP_TRUSTSTORE_PASSWORD_KEY), {code} WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093035#comment-14093035 ] Hadoop QA commented on YARN-2277: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661000/YARN-2277-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4589//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4589//console This message is automatically generated. Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093040#comment-14093040 ] Xuan Gong commented on YARN-2400: - Committed this addendum patch to trunk and branch-2. Thanks, Jian. TestAMRestart fails intermittently -- Key: YARN-2400 URL: https://issues.apache.org/jira/browse/YARN-2400 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2240.2.patch, YARN-2400.1.patch java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093094#comment-14093094 ] Larry McCay commented on YARN-2373: --- Thanks for the review, Jian. I will take a look today! WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093099#comment-14093099 ] Jian He commented on YARN-2138: --- [~varun_saxena], thanks for the input. committing this.. Cleanup notifyDone* methods in RMStateStore --- Key: YARN-2138 URL: https://issues.apache.org/jira/browse/YARN-2138 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.004.patch, YARN-2138.patch The storedException passed into notifyDoneStoringApplication is always null. Similarly for other notifyDone* methods. We can clean up these methods as this control flow path is not used anymore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)
[ https://issues.apache.org/jira/browse/YARN-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093111#comment-14093111 ] Maysam Yabandeh commented on YARN-2405: --- The problem seems to be that the two separate lists that maintain the list of apps are not in sync. The list of apps is taken from {code} MapApplicationId, RMApp rmContext.getRMApps() {code} and then looked up in the second list in AbstractYarnScheduler {code} MapApplicationId, SchedulerApplication applications {code} via the following code: {code} public FSSchedulerApp getSchedulerApp(ApplicationAttemptId appAttemptId) { return (FSSchedulerApp) super.getApplicationAttempt(appAttemptId); } public T getApplicationAttempt(ApplicationAttemptId applicationAttemptId) { SchedulerApplicationT app = applications.get(applicationAttemptId.getApplicationId()); return app == null ? null : app.getCurrentAppAttempt(); } {code} which returns null if it does not find the app attempt. The FairSchedulerAppsBlock does not check for the null returned value, thus NPE. By code inspection we found one of such cases that it could happen. Not sure if it is the same case that we had though. Anyhow, checking for null return values by getSchedulerApp seems to be a broader fix that covers that cases that we have not discovered yet by code inspection. One scenario that could potentially result into return null value is the following: FairScheduler#addApplication {code} RMApp rmApp = rmContext.getRMApps().get(applicationId); FSLeafQueue queue = assignToQueue(rmApp, queueName, user); if (queue == null) { return; } // Enforce ACLs UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user); if (...) { return; } SchedulerApplication application = new SchedulerApplication(queue, user); applications.put(applicationId, application); {code} NPE in FairSchedulerAppsBlock (scheduler page) -- Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2229) ContainerId can overflow with RM restart
[ https://issues.apache.org/jira/browse/YARN-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093109#comment-14093109 ] Jian He commented on YARN-2229: --- patch looks good to me. will commit in a day or two if no further comments. ContainerId can overflow with RM restart Key: YARN-2229 URL: https://issues.apache.org/jira/browse/YARN-2229 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2229.1.patch, YARN-2229.10.patch, YARN-2229.10.patch, YARN-2229.11.patch, YARN-2229.2.patch, YARN-2229.2.patch, YARN-2229.3.patch, YARN-2229.4.patch, YARN-2229.5.patch, YARN-2229.6.patch, YARN-2229.7.patch, YARN-2229.8.patch, YARN-2229.9.patch On YARN-2052, we changed containerId format: upper 10 bits are for epoch, lower 22 bits are for sequence number of Ids. This is for preserving semantics of {{ContainerId#getId()}}, {{ContainerId#toString()}}, {{ContainerId#compareTo()}}, {{ContainerId#equals}}, and {{ConverterUtils#toContainerId}}. One concern is epoch can overflow after RM restarts 1024 times. To avoid the problem, its better to make containerId long. We need to define the new format of container Id with preserving backward compatibility on this JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2405) NPE in FairSchedulerAppsBlock (scheduler page)
Maysam Yabandeh created YARN-2405: - Summary: NPE in FairSchedulerAppsBlock (scheduler page) Key: YARN-2405 URL: https://issues.apache.org/jira/browse/YARN-2405 Project: Hadoop YARN Issue Type: Bug Reporter: Maysam Yabandeh FairSchedulerAppsBlock#render throws NPE at this line {code} int fairShare = fsinfo.getAppFairShare(attemptId); {code} This causes the scheduler page now showing the app since it lack the definition of appsTableData {code} Uncaught ReferenceError: appsTableData is not defined {code} The problem is temporary meaning that it is usually resolved by itself either after a retry or after a few hours. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2397) RM web interface sometimes returns request is a replay error in secure mode
[ https://issues.apache.org/jira/browse/YARN-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093120#comment-14093120 ] Zhijie Shen commented on YARN-2397: --- [~vvasudev], thanks for the new patch! The logic of loading the simple auth filter seems to be still problematic: {code} // if security is not enabled and the default filter initializer has not // been set, set the initializer to include the // RMAuthenticationFilterInitializer which in turn will set up the simple // auth filter. String initializers = conf.get(filterInitializerConfKey); if (!UserGroupInformation.isSecurityEnabled()) { if (initializersClasses == null || initializersClasses.length == 0) { conf.set(filterInitializerConfKey, RMAuthenticationFilterInitializer.class.getName()); conf.set(authTypeKey, simple); } else if (initializers.equals(StaticUserWebFilter.class.getName())) { conf.set(filterInitializerConfKey, RMAuthenticationFilterInitializer.class.getName() + , + initializers); conf.set(authTypeKey, simple); } } {code} 4 conditions need to be satisfied to load the kerberos+DT auth filter. Then, in the remaining cases, the simple auth filter should be loaded, right? Or there intentionally exist the cases neither Kerberos+DT nor simple auth filter is used? If it is the former scenario, {code} if (!UserGroupInformation.isSecurityEnabled()) { {code} The above code will causes that any break except that of condition 1 result in no auth filter at all. And it still make the assumption that filter initializer can only be of auth and static user. However, initializersClasses can contain more than that (see YARN-2277). For the simple auth filter case, it's good to always use RMAuthenticationFilterInitializer or the standard AuthenticationFilterInitializer. The current code will causes that AuthenticationFilterInitializer is used under some configuration setup while RMAuthenticationFilterInitializer is used under the others. RM web interface sometimes returns request is a replay error in secure mode --- Key: YARN-2397 URL: https://issues.apache.org/jira/browse/YARN-2397 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Critical Attachments: apache-yarn-2397.0.patch, apache-yarn-2397.1.patch The RM web interface sometimes returns a request is a replay error if the default kerberos http filter is enabled. This is because it uses the new RMAuthenticationFilter in addition to the AuthenticationFilter. There is a workaround to set yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled to false. This bug is to fix the code to use only the RMAuthenticationFilter and not both. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2308: --- Attachment: jira2308.patch patch updated according to Wangda's suggestion NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2277) Add Cross-Origin support to the ATS REST API
[ https://issues.apache.org/jira/browse/YARN-2277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093199#comment-14093199 ] Zhijie Shen commented on YARN-2277: --- bq. I have provided a minimal CORS filter that will give us an idea if this is the direction to go. Based on the direction of this patch, the scope has widened to create a general CrossOriginFilter for use within all Hadoop REST APIs. Probably, we will want to split the different pieces us across JIRAs, umbrella, Filter and FilterInitializer, additional configuration, and individual REST servers. This way we can focus on the end goal of getting Tez UI done in a timely manner without forgetting completeness of CORS support. [~jeagles], thanks for your contribution! +1 for making the minimal CORS filter. Another concern is that if we upgrade jetty sometime in the future, we can reuse the cross-origin filter provided by it, rebase this on top of it. One additional suggestion is that we can start the CORS filter even in smaller scope: the timeline server only, which means we should move the filter/filter initializer to this sub module. Once it is made robust enough and proved to be reliable, we can promote it to hadoop-yarn-common or even hadoop-common. How do you think? Bellow are some detailed comments for the patch: 1. The prefix changes to yarn.timeline-service.http.cross-origin? {code} + public static final String PREFIX = hadoop.http.filter.cross.origin.; {code} 2. ALLOWED_ORIGINS - allowed-origins? Not to make the config name too long. {code} + // Filter configuration + public static final String ALLOWED_ORIGINS = access.control.allowed.origins; {code} 3. Should most of the methods in CrossOriginFilter be private? 4. Is it better to make it configurable as well? Why allowedMethods doesn't have PUT? In the doc: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS, it seems that the headers can go beyond the following set. {code} + void initializeAllowedMethods(FilterConfig filterConfig) { +allowedMethods.add(GET); +allowedMethods.add(POST); +allowedMethods.add(HEAD); +LOG.info(Allowed Methods: + getAllowedMethodsHeader()); + } + + void initializeAllowedHeaders(FilterConfig filterConfig) { +allowedHeaders.add(X-Requested-With); +allowedHeaders.add(Content-Type); +allowedHeaders.add(Accept); +allowedHeaders.add(Origin); +LOG.info(Allowed Headers: + getAllowedHeadersHeader()); + } {code} 5. Should we include Access-Control-Max-Age? 6. Is it better to invoke doCrossFilter after chain.doFilter, in case devs are going to do something special in servlet with the res object directly? {code} + @Override + public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) +throws IOException, ServletException { +doCrossFilter((HttpServletRequest) req, (HttpServletResponse) res); +chain.doFilter(req, res); + } {code} Add Cross-Origin support to the ATS REST API Key: YARN-2277 URL: https://issues.apache.org/jira/browse/YARN-2277 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2277-CORS.patch, YARN-2277-JSONP.patch, YARN-2277-v2.patch, YARN-2277-v3.patch, YARN-2277-v3.patch, YARN-2277-v4.patch, YARN-2277-v5.patch, YARN-2277-v6.patch As the Application Timeline Server is not provided with built-in UI, it may make sense to enable JSONP or CORS Rest API capabilities to allow for remote UI to access the data directly via javascript without cross side server browser blocks coming into play. Example client may be like http://api.jquery.com/jQuery.getJSON/ This can alleviate the need to create a local proxy cache. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093209#comment-14093209 ] Larry McCay commented on YARN-2373: --- You are absolutely right. I will have a new patch shortly. Thanks, again! WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093217#comment-14093217 ] Ashwin Shankar commented on YARN-2393: -- [~ywskycn], On skimming through the patch at high level, I had a quick comment. Apart from the configuration in alloc xml, queues can get created dynamically when one uses QueuePlacementRules like primary group, nested user queue etc. with create=true. Shouldn't we recompute static shares in these cases ? Fair Scheduler : Implement static fair share Key: YARN-2393 URL: https://issues.apache.org/jira/browse/YARN-2393 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2393-1.patch Static fair share is a fair share allocation considering all(active/inactive) queues.It would be shown on the UI for better predictability of finish time of applications. We would compute static fair share only when needed, like on queue creation, node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093222#comment-14093222 ] Wei Yan commented on YARN-2393: --- [~ashwinshankar77], thanks for the comment. Will check that. Fair Scheduler : Implement static fair share Key: YARN-2393 URL: https://issues.apache.org/jira/browse/YARN-2393 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2393-1.patch Static fair share is a fair share allocation considering all(active/inactive) queues.It would be shown on the UI for better predictability of finish time of applications. We would compute static fair share only when needed, like on queue creation, node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093216#comment-14093216 ] Craig Welch commented on YARN-1198: --- So, I'm in the process of putting together a patch to calculate the headroom in more cases as described in this jira. It strikes me that one of the changes called for is to change headroom to apply to the queue+user combination instead of to the application as it does today - today, headroom is per application, as I understand the jira, the suggestion is to establish the same headroom value for a given user + queue combination and to change the headroom simultaneously for all applications for a user + queue any time the headroom would change for any of them. This suggests that a reasonable approach might be to use the same resource instance for a given user+queue combination, instead of having it per application. Thoughts? Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: YARN-1198.1.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Larry McCay updated YARN-2373: -- Attachment: YARN-2373.patch Attaching new patch to address the issue identified through [~jianhe]'s review. Thanks again, Jian! WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2393) Fair Scheduler : Implement static fair share
[ https://issues.apache.org/jira/browse/YARN-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093258#comment-14093258 ] Hadoop QA commented on YARN-2393: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661048/YARN-2393-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4592//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4592//console This message is automatically generated. Fair Scheduler : Implement static fair share Key: YARN-2393 URL: https://issues.apache.org/jira/browse/YARN-2393 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Ashwin Shankar Assignee: Wei Yan Attachments: YARN-2393-1.patch Static fair share is a fair share allocation considering all(active/inactive) queues.It would be shown on the UI for better predictability of finish time of applications. We would compute static fair share only when needed, like on queue creation, node added/removed. Please see YARN-2026 for discussions on this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093259#comment-14093259 ] Hadoop QA commented on YARN-2308: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661044/jira2308.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4591//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4591//console This message is automatically generated. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
[ https://issues.apache.org/jira/browse/YARN-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2406: -- Issue Type: Sub-task (was: Improvement) Parent: YARN-128 Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2406) Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto
Jian He created YARN-2406: - Summary: Move RM recovery related proto to yarn_server_resourcemanager_recovery.proto Key: YARN-2406 URL: https://issues.apache.org/jira/browse/YARN-2406 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Today most recovery related proto records are defined in yarn_server_resourcemanager_service_protos.proto which is inside YARN-API module. Since these records are internally used by RM only, we can move them to the yarn_server_resourcemanager_recovery.proto file inside RM-server module -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-2317: Attachment: YARN-2317-081114.patch Thanks [~zjshen]! I've addressed the points in your review. In general, this patch performs the following work on the How to write YARN application webpage: 1. Update the document with the latest clients, rather than the old protocols, since the protocol based approach is no longer encouraged. (Major change) 2. Replacing sample code with the code in the latest version of distributed shell. (Major change) 3. Update FAQ and useful links section with latest information (Minor change) With regard to your comments, I've fixed all of them following your suggestions. Specially, to avoid confusion, I fixed issue 8 by directly removing the commented code. Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, YARN-2317-073014.patch, YARN-2317-081114.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2308: --- Attachment: jira2308.patch NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2138) Cleanup notifyDone* methods in RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093351#comment-14093351 ] Hudson commented on YARN-2138: -- FAILURE: Integrated in Hadoop-trunk-Commit #6048 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6048/]) YARN-2138. Cleaned up notifyDone* APIs in RMStateStore. Contributed by Varun Saxena (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617341) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppNewSavedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppUpdateSavedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptNewSavedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/event/RMAppAttemptUpdateSavedEvent.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreTestBase.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java Cleanup notifyDone* methods in RMStateStore --- Key: YARN-2138 URL: https://issues.apache.org/jira/browse/YARN-2138 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Varun Saxena Fix For: 2.6.0 Attachments: YARN-2138.002.patch, YARN-2138.003.patch, YARN-2138.004.patch, YARN-2138.patch The storedException passed into notifyDoneStoringApplication is always null. Similarly for other notifyDone* methods. We can clean up these methods as this control flow path is not used anymore. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2400) TestAMRestart fails intermittently
[ https://issues.apache.org/jira/browse/YARN-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093349#comment-14093349 ] Hudson commented on YARN-2400: -- FAILURE: Integrated in Hadoop-trunk-Commit #6048 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6048/]) YARN-2400: Addendum fix for TestAMRestart failure. Contributed by Jian He (xgong: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1617333) * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java TestAMRestart fails intermittently -- Key: YARN-2400 URL: https://issues.apache.org/jira/browse/YARN-2400 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2240.2.patch, YARN-2400.1.patch java.lang.AssertionError: AppAttempt state is not correct (timedout) expected:ALLOCATED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:417) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:579) at org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:586) at org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093459#comment-14093459 ] Jian He commented on YARN-2373: --- Larry, Thanks for the update. But.. seems to be a bug again? should be getPassword(sslConf, WEB_APP_KEYSTORE_PASSWORD_KEY) ? {code} sslConf.get(getPassword(sslConf, WEB_APP_KEYSTORE_PASSWORD_KEY)) {code} The test is testing CredentialProvider and the newly added helper API. can you add a test for loadSslConfiguration method ? WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093463#comment-14093463 ] Hadoop QA commented on YARN-2308: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661066/jira2308.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4593//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4593//console This message is automatically generated. NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager
Yu Gao created YARN-2407: Summary: Users are not allowed to view their own jobs, denied by JobACLsManager Key: YARN-2407 URL: https://issues.apache.org/jira/browse/YARN-2407 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.4.1 Reporter: Yu Gao Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the commad-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093469#comment-14093469 ] Larry McCay commented on YARN-2373: --- You are right again. I am fixing it now and enhancing the test as you suggest. Apologies. WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager
[ https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093484#comment-14093484 ] Yu Gao commented on YARN-2407: -- After turn on debug, got this in ApplicationMaster log: DEBUG [IPC Server handler 0 on 36796] org.apache.hadoop.mapred.JobACLsManager: checkAccess job acls, jobOwner: yarn jobacl: VIEW_JOB user: user1 The jobOwner above is incorrect. It should be user1 since it was user1 who submitted the job. This error is caused by an incorrect implementation in JobImpl, which has defined two user name fields: username - user got from system property user.name, which is the container process owner userName - the value is passed in via JobImpl constructor, which is the end user who has submitted the job The JobImpl#checkAccess method should have used userName as the job owner, instead of username. Users are not allowed to view their own jobs, denied by JobACLsManager -- Key: YARN-2407 URL: https://issues.apache.org/jira/browse/YARN-2407 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.4.1 Reporter: Yu Gao Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the commad-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager
[ https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Gao updated YARN-2407: - Attachment: YARN-2407.patch Users are not allowed to view their own jobs, denied by JobACLsManager -- Key: YARN-2407 URL: https://issues.apache.org/jira/browse/YARN-2407 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.4.1 Reporter: Yu Gao Attachments: YARN-2407.patch Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the commad-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager
[ https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Gao updated YARN-2407: - Description: Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the command-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) was: Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the commad-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) Users are not allowed to view their own jobs, denied by JobACLsManager -- Key: YARN-2407 URL: https://issues.apache.org/jira/browse/YARN-2407 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.4.1 Reporter: Yu Gao Attachments: YARN-2407.patch Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the command-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191)
[jira] [Updated] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Larry McCay updated YARN-2373: -- Attachment: YARN-2373.patch Fixed issue found by [~jianhe] and added direct test of loadSslConfiguration. In order to test it, I had to add a new signature for loadSslConfiguration that accepts a Configuration instance for providing the provider.path configuration for the CredentialProvider API. I made this new signature public static as well - I figured it may make sense for some consumers to provider their own configuration. Let me know if you would rather it not be made public. WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2407) Users are not allowed to view their own jobs, denied by JobACLsManager
[ https://issues.apache.org/jira/browse/YARN-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093543#comment-14093543 ] Hadoop QA commented on YARN-2407: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661090/YARN-2407.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4594//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4594//console This message is automatically generated. Users are not allowed to view their own jobs, denied by JobACLsManager -- Key: YARN-2407 URL: https://issues.apache.org/jira/browse/YARN-2407 Project: Hadoop YARN Issue Type: Bug Components: applications Affects Versions: 2.4.1 Reporter: Yu Gao Attachments: YARN-2407.patch Have a Hadoop 2.4.1 cluster with Yarn ACL enabled, and try to submit jobs as a non-admin user user1. The job could be finished successfully, but the running progress was not displayed correctly on the command-line, and I got following in the corresponding ApplicationMaster log: INFO [IPC Server handler 0 on 56717] org.apache.hadoop.ipc.Server: IPC Server handler 0 on 56717, call org.apache.hadoop.mapreduce.v2.api.MRClientProtocolPB.getJobReport from 9.30.95.26:61024 Call#59 Retry#0 org.apache.hadoop.security.AccessControlException: User user1 cannot perform operation VIEW_JOB on job_1407456690588_0003 at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.verifyAndGetJob(MRClientService.java:191) at org.apache.hadoop.mapreduce.v2.app.client.MRClientService$MRClientProtocolHandler.getJobReport(MRClientService.java:233) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getJobReport(MRClientProtocolPBServiceImpl.java:122) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:275) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(AccessController.java:366) at javax.security.auth.Subject.doAs(Subject.java:572) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1567) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093563#comment-14093563 ] Jason Lowe commented on YARN-1198: -- I think having a per-user-per-queue headroom computation and reusing it between applications for that user in that queue makes sense. I don't know of a case where the headroom of one app for a user in a queue should be computed differently than another app for the same user in the same queue. Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: YARN-1198.1.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093592#comment-14093592 ] Sandy Ryza commented on YARN-2399: -- I noticed in FSAppAttempt there are some instance variables mixed in with the functions. Not sure if it was like this already, but can we move them up to the top? FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt Key: YARN-2399 URL: https://issues.apache.org/jira/browse/YARN-2399 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2399-1.patch, yarn-2399-2.patch FairScheduler has two data structures for an application, making the code hard to track. We should merge these for better maintainability in the long-term. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2317) Update documentation about how to write YARN applications
[ https://issues.apache.org/jira/browse/YARN-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093598#comment-14093598 ] Hadoop QA commented on YARN-2317: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661059/YARN-2317-081114.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4595//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4595//console This message is automatically generated. Update documentation about how to write YARN applications - Key: YARN-2317 URL: https://issues.apache.org/jira/browse/YARN-2317 Project: Hadoop YARN Issue Type: Sub-task Components: documentation Reporter: Li Lu Assignee: Li Lu Fix For: 2.6.0 Attachments: YARN-2317-071714.patch, YARN-2317-073014-1.patch, YARN-2317-073014.patch, YARN-2317-081114.patch Some information in WritingYarnApplications webpage is out-dated. Need some refresh work on this document to reflect the most recent changes in YARN APIs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2399) FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt
[ https://issues.apache.org/jira/browse/YARN-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093597#comment-14093597 ] Sandy Ryza commented on YARN-2399: -- Also, can we move all the methods that implement methods in Schedulable together? {code} + // TODO (KK): Rename these {code} Rename these? {code} -new ConcurrentHashMapApplicationId,SchedulerApplicationFSSchedulerApp(); +new ConcurrentHashMapApplicationId,SchedulerApplicationFSAppAttempt(); {code} Mind adding a space here after ApplicationId because you're fixing this line anyway? {code} + private FSAppAttempt mockAppSched(long startTime) { +FSAppAttempt schedApp = mock(FSAppAttempt.class); +when(schedApp.getStartTime()).thenReturn(startTime); +return schedApp; } {code} Call this mockAppAttempt? Otherwise, LGTM FairScheduler: Merge AppSchedulable and FSSchedulerApp into FSAppAttempt Key: YARN-2399 URL: https://issues.apache.org/jira/browse/YARN-2399 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Affects Versions: 2.5.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2399-1.patch, yarn-2399-2.patch FairScheduler has two data structures for an application, making the code hard to track. We should merge these for better maintainability in the long-term. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2373) WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords
[ https://issues.apache.org/jira/browse/YARN-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093626#comment-14093626 ] Hadoop QA commented on YARN-2373: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661096/YARN-2373.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4596//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4596//console This message is automatically generated. WebAppUtils Should Use configuration.getPassword for Accessing SSL Passwords Key: YARN-2373 URL: https://issues.apache.org/jira/browse/YARN-2373 Project: Hadoop YARN Issue Type: Bug Reporter: Larry McCay Attachments: YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch, YARN-2373.patch As part of HADOOP-10904, this jira represents a change to WebAppUtils to uptake the use of the credential provider API through the new method on Configuration called getPassword. This provides an alternative to storing the passwords in clear text within the ssl-server.xml file while maintaining backward compatibility with that behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (YARN-1337) Recover containers upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093640#comment-14093640 ] Jason Lowe edited comment on YARN-1337 at 8/12/14 1:54 AM: --- Thanks for taking another look, Junping. bq. Better to add javadoc for new added (or move from private) public method. I documented all of the NodeStatusUpdater methods and also the NMStateStoreService public methods that didn't already have javadocs. bq. volatile is unncessary as it was using AtomicBoolean already. Fixed. was (Author: jlowe): Thanks for taking another look, Junping. .bq Better to add javadoc for new added (or move from private) public method. I documented all of the NodeStatusUpdater methods and also the NMStateStoreService public methods that didn't already have javadocs. .bq volatile is unncessary as it was using AtomicBoolean already. Fixed. Recover containers upon nodemanager restart --- Key: YARN-1337 URL: https://issues.apache.org/jira/browse/YARN-1337 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, YARN-1337-v3.patch To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1337) Recover containers upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-1337: - Attachment: YARN-1337-v3.patch Thanks for taking another look, Junping. .bq Better to add javadoc for new added (or move from private) public method. I documented all of the NodeStatusUpdater methods and also the NMStateStoreService public methods that didn't already have javadocs. .bq volatile is unncessary as it was using AtomicBoolean already. Fixed. Recover containers upon nodemanager restart --- Key: YARN-1337 URL: https://issues.apache.org/jira/browse/YARN-1337 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, YARN-1337-v3.patch To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subramaniam Venkatraman Krishnan updated YARN-2378: --- Attachment: YARN-2378.patch Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093648#comment-14093648 ] Subramaniam Venkatraman Krishnan commented on YARN-2378: Thanks [~vvasudev] for resolving the host issue. The only test case that failed - TestAMRestart passes consistently for me. Thanks for your feedback [~leftnoteasy]. I am uploading a new patch that addresses all your comments. Additionally based on our offline discussion and comments in YARN-807, I have added pending apps also in CapacityScheduler#getAppsInQueue() and refactored moveAllApps into AbstractYarnScheduler. I ran all the relevant test cases and things look good. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2331) Distinguish shutdown during supervision vs. shutdown for rolling upgrade
[ https://issues.apache.org/jira/browse/YARN-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093655#comment-14093655 ] Junping Du commented on YARN-2331: -- [~jlowe], for rollup when NM is not supervised, I think another way is to add a command line in RM Admin to bring down specific NM without killing containers (by notifying RMNode and heartbeat back) given no admin port to NM so far. The NM services shutdown (no matter decommission or failed occasionally) without supervised won't trigger this CLI so won't preserve running containers. Thoughts? Distinguish shutdown during supervision vs. shutdown for rolling upgrade Key: YARN-2331 URL: https://issues.apache.org/jira/browse/YARN-2331 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.6.0 Reporter: Jason Lowe When the NM is shutting down with restart support enabled there are scenarios we'd like to distinguish and behave accordingly: # The NM is running under supervision. In that case containers should be preserved so the automatic restart can recover them. # The NM is not running under supervision and a rolling upgrade is not being performed. In that case the shutdown should kill all containers since it is unlikely the NM will be restarted in a timely manner to recover them. # The NM is not running under supervision and a rolling upgrade is being performed. In that case the shutdown should not kill all containers since a restart is imminent due to the rolling upgrade and the containers will be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2033: -- Attachment: YARN-2033.5.patch Rebase the patch according to the latest trunk, make AppicationHistoryManagerOnTimelineStore throw NotFoundException instead of returning null, to be consistent with the existing behavior. The patch also includes some minor improvement. Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093684#comment-14093684 ] Junping Du commented on YARN-1337: -- Latest patch looks good to me. +1 pending on Jenkins' test. Recover containers upon nodemanager restart --- Key: YARN-1337 URL: https://issues.apache.org/jira/browse/YARN-1337 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, YARN-1337-v3.patch To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093693#comment-14093693 ] Hadoop QA commented on YARN-2378: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661117/YARN-2378.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4597//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4597//console This message is automatically generated. Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Synchronization among Mappers in map-reduce task
Hi Folks , I have been writing a map-reduce application where I am having an input file containing records and every field in the record is separated by some delimiter. In addition to this user will also provide a list of columns that he wants to lookup in a master properties file (stored in HDFS). If this columns (lets say it a key) is present in master properties file then get the corresponding value and update the key with this value and if the key is not present it in the master properties file then it will create a new value for this key and will write to this property file and will also update in the record. I have written this application , tested it and everything worked fine till now. *e.g :* *I/P Record :* This | is | the | test | record *Columns :* 2,4 (that means code will look up only field *is and test* in the master properties file.) Here , I have a question. *Q 1:* In the case when my input file is huge and it is splitted across the multiple mappers , I was getting the below mentioned exception where all the other mappers tasks were failing. *Also initially when I started the job my master properties file is empty.* In my code I have a check if this file (master properties) doesn't exist create a new empty file before submitting the job itself. e.g : If i have 4 splits of data , then 3 map tasks are failing. But after this all the failed map tasks restarts and finally the job become successful. So , *here is the question , is it possible to make sure that when one of the mapper tasks is writing to a file , other should wait until the first one is finished. ?* I read that all the mappers task don't interact with each other. Also what will happen in the scenario when I start multiple parallel map-reduce jobs and all of them working on the same properties files. *Is there any way to have synchronization between two independent map reduce jobs*? I have also read that ZooKeeper can be used in such scenarios , Is that correct ? Error: com.techidiocy.hadoop.filesystem.api.exceptions.HDFSFileSystemException: IOException - failed while appending data to the file -Failed to create file [/user/cloudera/lob/master/bank.properties] for [DFSClient_attempt_1407778869492_0032_m_02_0_1618418105_1] on client [10.X.X.17], because this file is already being created by [DFSClient_attempt_1407778869492_0032_m_05_0_-949968337_1] on [10.X.X.17] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2548) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2377) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2612) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2575) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
[jira] [Commented] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093695#comment-14093695 ] Wangda Tan commented on YARN-2378: -- [~subru], I've ran the previous failed test locally, it passed. And as same as the latest Jenkins result. I think LGTM, +1. [~jianhe], would you like to take a look at this? Thanks, Wangda Adding support for moving apps between queues in Capacity Scheduler --- Key: YARN-2378 URL: https://issues.apache.org/jira/browse/YARN-2378 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Subramaniam Venkatraman Krishnan Assignee: Subramaniam Venkatraman Krishnan Labels: capacity-scheduler Attachments: YARN-2378.patch, YARN-2378.patch, YARN-2378.patch As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 to smaller patches for manageability. This JIRA will address adding support for moving apps between queues in Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback
[ https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093696#comment-14093696 ] Wangda Tan commented on YARN-415: - [~jianhe], would you like to take a look at it? Capture memory utilization at the app-level for chargeback -- Key: YARN-415 URL: https://issues.apache.org/jira/browse/YARN-415 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Affects Versions: 0.23.6 Reporter: Kendall Thrapp Assignee: Andrey Klochkov Attachments: YARN-415--n10.patch, YARN-415--n2.patch, YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, YARN-415.201406262136.txt, YARN-415.201407042037.txt, YARN-415.201407071542.txt, YARN-415.201407171553.txt, YARN-415.201407172144.txt, YARN-415.201407232237.txt, YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.201408062232.txt, YARN-415.201408080204.txt, YARN-415.201408092006.txt, YARN-415.patch For the purpose of chargeback, I'd like to be able to compute the cost of an application in terms of cluster resource usage. To start out, I'd like to get the memory utilization of an application. The unit should be MB-seconds or something similar and, from a chargeback perspective, the memory amount should be the memory reserved for the application, as even if the app didn't use all that memory, no one else was able to use it. (reserved ram for container 1 * lifetime of container 1) + (reserved ram for container 2 * lifetime of container 2) + ... + (reserved ram for container n * lifetime of container n) It'd be nice to have this at the app level instead of the job level because: 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't appear on the job history server). 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm). This new metric should be available both through the RM UI and RM Web Services REST API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2308) NPE happened when RM restart after CapacityScheduler queue configuration changed
[ https://issues.apache.org/jira/browse/YARN-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093714#comment-14093714 ] Wangda Tan commented on YARN-2308: -- [~lichangleo], Thanks for updating, I think following line is not necessary bq. +conf.setBoolean(YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED, true); I just tried in my local, remove it should be fine. Besides this, LGTM, +1. [~zjshen], do you have take a look at this? Thanks, Wangda NPE happened when RM restart after CapacityScheduler queue configuration changed - Key: YARN-2308 URL: https://issues.apache.org/jira/browse/YARN-2308 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 2.6.0 Reporter: Wangda Tan Assignee: chang li Priority: Critical Attachments: jira2308.patch, jira2308.patch, jira2308.patch I encountered a NPE when RM restart {code} 2014-07-16 07:22:46,957 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type APP_ATTEMPT_ADDED to the scheduler java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:566) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:922) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:98) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:594) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:654) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:85) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:698) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:682) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:744) {code} And RM will be failed to restart. This is caused by queue configuration changed, I removed some queues and added new queues. So when RM restarts, it tries to recover history applications, and when any of queues of these applications removed, NPE will be raised. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2408) Resource Request REST API for YARN
Renan DelValle created YARN-2408: Summary: Resource Request REST API for YARN Key: YARN-2408 URL: https://issues.apache.org/jira/browse/YARN-2408 Project: Hadoop YARN Issue Type: New Feature Components: webapp Reporter: Renan DelValle Priority: Minor I’m proposing a new REST API for YARN which exposes a snapshot of the Resource Requests that exist inside of the Scheduler. My motivation behind this new feature is to allow external software to monitor the amount of resources being requested to gain more insightful information into cluster usage than is already provided. The API can also be used by external software to detect a starved application and alert the appropriate users and/or sys admin so that the problem may be remedied. Here is the proposed API: {code:xml} resourceRequests MB96256/MB VCores94/VCores appMaster applicationIdapplication_/applicationId applicationAttemptIdappattempt_/applicationAttemptId queueNamedefault/queueName totalPendingMB96256/totalPendingMB totalPendingVCores94/totalPendingVCores numResourceRequests3/numResourceRequests resourceRequests request MB1024/MB VCores1/VCores resourceName/default-rack/resourceName numContainers94/numContainers relaxLocalitytrue/relaxLocality priority20/priority /request request MB1024/MB VCores1/VCores resourceName*/resourceName numContainers94/numContainers relaxLocalitytrue/relaxLocality priority20/priority /request request MB1024/MB VCores1/VCores resourceNamemaster/resourceName numContainers94/numContainers relaxLocalitytrue/relaxLocality priority20/priority /request /resourceRequests /appMaster /resourceRequests {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2408) Resource Request REST API for YARN
[ https://issues.apache.org/jira/browse/YARN-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated YARN-2408: - Attachment: YARN-2408.patch Resource Request REST API for YARN -- Key: YARN-2408 URL: https://issues.apache.org/jira/browse/YARN-2408 Project: Hadoop YARN Issue Type: New Feature Components: webapp Reporter: Renan DelValle Priority: Minor Labels: features Attachments: YARN-2408.patch I’m proposing a new REST API for YARN which exposes a snapshot of the Resource Requests that exist inside of the Scheduler. My motivation behind this new feature is to allow external software to monitor the amount of resources being requested to gain more insightful information into cluster usage than is already provided. The API can also be used by external software to detect a starved application and alert the appropriate users and/or sys admin so that the problem may be remedied. Here is the proposed API: {code:xml} resourceRequests MB96256/MB VCores94/VCores appMaster applicationIdapplication_/applicationId applicationAttemptIdappattempt_/applicationAttemptId queueNamedefault/queueName totalPendingMB96256/totalPendingMB totalPendingVCores94/totalPendingVCores numResourceRequests3/numResourceRequests resourceRequests request MB1024/MB VCores1/VCores resourceName/default-rack/resourceName numContainers94/numContainers relaxLocalitytrue/relaxLocality priority20/priority /request request MB1024/MB VCores1/VCores resourceName*/resourceName numContainers94/numContainers relaxLocalitytrue/relaxLocality priority20/priority /request request MB1024/MB VCores1/VCores resourceNamemaster/resourceName numContainers94/numContainers relaxLocalitytrue/relaxLocality priority20/priority /request /resourceRequests /appMaster /resourceRequests {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1337) Recover containers upon nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093733#comment-14093733 ] Hadoop QA commented on YARN-1337: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661113/YARN-1337-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4598//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4598//console This message is automatically generated. Recover containers upon nodemanager restart --- Key: YARN-1337 URL: https://issues.apache.org/jira/browse/YARN-1337 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-1337-v1.patch, YARN-1337-v2.patch, YARN-1337-v3.patch To support work-preserving NM restart we need to recover the state of the containers when the nodemanager went down. This includes informing the RM of containers that have exited in the interim and a strategy for dealing with the exit codes from those containers along with how to reacquire the active containers and determine their exit codes when they terminate. The state of finished containers also needs to be recovered. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14093742#comment-14093742 ] Hadoop QA commented on YARN-2033: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661126/YARN-2033.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4599//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4599//console This message is automatically generated. Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.3.patch, YARN-2033.4.patch, YARN-2033.5.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch, YARN-2033_ALL.3.patch, YARN-2033_ALL.4.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)