[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186428#comment-14186428 ] Hadoop QA commented on YARN-1813: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12661008/YARN-1813.4.patch against trunk revision 971e91c. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5600//console This message is automatically generated. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Shouldn't change the value in labelCollections if the key already exists and potential NPE at CommonNodeLabelsManager.
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Description: CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). was: potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). Summary: Shouldn't change the value in labelCollections if the key already exists and potential NPE at CommonNodeLabelsManager. (was: potential NPE in checkRemoveLabelsFromNode of CommonNodeLabelsManager) Shouldn't change the value in labelCollections if the key already exists and potential NPE at CommonNodeLabelsManager. -- Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Shouldn't change the value in labelCollections if the key already exists and potential NPE at CommonNodeLabelsManager.
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Description: CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). was: CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. It because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). Shouldn't change the value in labelCollections if the key already exists and potential NPE at CommonNodeLabelsManager. -- Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2759) addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource being reset.
[ https://issues.apache.org/jira/browse/YARN-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2759: Description: addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource being reset. (was: addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset.) Summary: addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource being reset. (was: addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource is reset.) addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource being reset. Key: YARN-2759 URL: https://issues.apache.org/jira/browse/YARN-2759 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2759.000.patch addToCluserNodeLabels should not change the value in labelCollections if the key already exists to avoid the Label.resource being reset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186446#comment-14186446 ] Janos Matyas commented on YARN-1964: Hi Abin, I have applied the changes to the image the next day Re your request - sorry for being late, Docker.io does not send notifications about comments. Should you need anything in the future please drop me a direct email - janos.mat...@sequenceiq.com - or open a GitHub issue. Janos Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2669) FairScheduler: queueName shouldn't allow periods the allocation.xml
[ https://issues.apache.org/jira/browse/YARN-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186448#comment-14186448 ] bc Wong commented on YARN-2669: --- Replacing . with \_dot\_ sounds fine here. While it doesn't eliminate collision, it makes it unlikely. Again, I'd leave it for another patch to do the real fix, which is more involved. FairScheduler: queueName shouldn't allow periods the allocation.xml --- Key: YARN-2669 URL: https://issues.apache.org/jira/browse/YARN-2669 Project: Hadoop YARN Issue Type: Improvement Reporter: Wei Yan Assignee: Wei Yan Priority: Minor Attachments: YARN-2669-1.patch, YARN-2669-2.patch, YARN-2669-3.patch For an allocation file like: {noformat} allocations queue name=root.q1 minResources4096mb,4vcores/minResources /queue /allocations {noformat} Users may wish to config minResources for a queue with full path root.q1. However, right now, fair scheduler will treat this configureation for the queue with full name root.root.q1. We need to print out a warning msg to notify users about this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2762) Provide RMAdminCLI args validation for NodeLabelManager operations
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2762: - Attachment: YARN-2762.patch Provide RMAdminCLI args validation for NodeLabelManager operations -- Key: YARN-2762 URL: https://issues.apache.org/jira/browse/YARN-2762 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2762.patch All NodeLabel args validation's are done at server side. The same can be done at RMAdminCLI so that unnecessary RPC calls can be avoided. And for the input such as x,y,,z,, no need to add empty string instead can be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186494#comment-14186494 ] Varun Vasudev commented on YARN-2741: - +1, patch looks good. Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Attachment: YARN-1813.5.patch Refreshed a patch. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Affects Version/s: 2.4.1 2.5.1 Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0, 2.4.1, 2.5.1 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186523#comment-14186523 ] Hadoop QA commented on YARN-1813: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677550/YARN-1813.5.patch against trunk revision 0398db1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5601//console This message is automatically generated. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0, 2.4.1, 2.5.1 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2737) Misleading msg in LogCLI when app is not successfully submitted
[ https://issues.apache.org/jira/browse/YARN-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186548#comment-14186548 ] Tsuyoshi OZAWA commented on YARN-2737: -- YARN-1813 is addressing a issue for handling AccessControlException correctly. Misleading msg in LogCLI when app is not successfully submitted Key: YARN-2737 URL: https://issues.apache.org/jira/browse/YARN-2737 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Jian He Assignee: Rohith {{LogCLiHelpers#logDirNotExist}} prints msg {{Log aggregation has not completed or is not enabled.}} if the app log file doesn't exist. This is misleading because if the application is not submitted successfully. Clearly, we won't have logs for this application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2763) TestNMSimulator fails sometimes due to timing issue
Varun Vasudev created YARN-2763: --- Summary: TestNMSimulator fails sometimes due to timing issue Key: YARN-2763 URL: https://issues.apache.org/jira/browse/YARN-2763 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev TestNMSimulator fails sometimes due to timing issues. From a failure - {noformat} 2014-10-16 23:21:42,343 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(337)) - NodeManager from node node1(cmPort: 0 httpPort: 80) registered with capability: memory:10240, vCores:10, assigned nodeId node1:0 2014-10-16 23:21:42,397 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(642)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2014-10-16 23:21:42,400 INFO rmnode.RMNodeImpl (RMNodeImpl.java:handle(423)) - node1:0 Node Transitioned from NEW to RUNNING 2014-10-16 23:21:42,404 INFO fair.FairScheduler (FairScheduler.java:addNode(825)) - Added node node1:0 cluster capacity: memory:10240, vCores:10 2014-10-16 23:21:42,407 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:18088 2014-10-16 23:21:42,409 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(642)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2014-10-16 23:21:42,410 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 18032 2014-10-16 23:21:42,412 INFO ipc.Server (Server.java:run(706)) - Stopping IPC Server listener on 18032 2014-10-16 23:21:42,412 INFO ipc.Server (Server.java:run(832)) - Stopping IPC Server Responder {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2763) TestNMSimulator fails sometimes due to timing issue
[ https://issues.apache.org/jira/browse/YARN-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2763: Attachment: apache-yarn-2763.0.patch Attached patch with fix. TestNMSimulator fails sometimes due to timing issue --- Key: YARN-2763 URL: https://issues.apache.org/jira/browse/YARN-2763 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2763.0.patch TestNMSimulator fails sometimes due to timing issues. From a failure - {noformat} 2014-10-16 23:21:42,343 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(337)) - NodeManager from node node1(cmPort: 0 httpPort: 80) registered with capability: memory:10240, vCores:10, assigned nodeId node1:0 2014-10-16 23:21:42,397 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(642)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2014-10-16 23:21:42,400 INFO rmnode.RMNodeImpl (RMNodeImpl.java:handle(423)) - node1:0 Node Transitioned from NEW to RUNNING 2014-10-16 23:21:42,404 INFO fair.FairScheduler (FairScheduler.java:addNode(825)) - Added node node1:0 cluster capacity: memory:10240, vCores:10 2014-10-16 23:21:42,407 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:18088 2014-10-16 23:21:42,409 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(642)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2014-10-16 23:21:42,410 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 18032 2014-10-16 23:21:42,412 INFO ipc.Server (Server.java:run(706)) - Stopping IPC Server listener on 18032 2014-10-16 23:21:42,412 INFO ipc.Server (Server.java:run(832)) - Stopping IPC Server Responder {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2763) TestNMSimulator fails sometimes due to timing issue
[ https://issues.apache.org/jira/browse/YARN-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186574#comment-14186574 ] Hadoop QA commented on YARN-2763: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677561/apache-yarn-2763.0.patch against trunk revision 0398db1. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5602//console This message is automatically generated. TestNMSimulator fails sometimes due to timing issue --- Key: YARN-2763 URL: https://issues.apache.org/jira/browse/YARN-2763 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2763.0.patch TestNMSimulator fails sometimes due to timing issues. From a failure - {noformat} 2014-10-16 23:21:42,343 INFO resourcemanager.ResourceTrackerService (ResourceTrackerService.java:registerNodeManager(337)) - NodeManager from node node1(cmPort: 0 httpPort: 80) registered with capability: memory:10240, vCores:10, assigned nodeId node1:0 2014-10-16 23:21:42,397 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(642)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2014-10-16 23:21:42,400 INFO rmnode.RMNodeImpl (RMNodeImpl.java:handle(423)) - node1:0 Node Transitioned from NEW to RUNNING 2014-10-16 23:21:42,404 INFO fair.FairScheduler (FairScheduler.java:addNode(825)) - Added node node1:0 cluster capacity: memory:10240, vCores:10 2014-10-16 23:21:42,407 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:18088 2014-10-16 23:21:42,409 ERROR delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:run(642)) - ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2014-10-16 23:21:42,410 INFO ipc.Server (Server.java:stop(2437)) - Stopping server on 18032 2014-10-16 23:21:42,412 INFO ipc.Server (Server.java:run(706)) - Stopping IPC Server listener on 18032 2014-10-16 23:21:42,412 INFO ipc.Server (Server.java:run(832)) - Stopping IPC Server Responder {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2762) Provide RMAdminCLI args validation for NodeLabelManager operations
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186617#comment-14186617 ] Rohith commented on YARN-2762: -- Added sanity check at client that validate the arguments. Provide RMAdminCLI args validation for NodeLabelManager operations -- Key: YARN-2762 URL: https://issues.apache.org/jira/browse/YARN-2762 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2762.patch All NodeLabel args validation's are done at server side. The same can be done at RMAdminCLI so that unnecessary RPC calls can be avoided. And for the input such as x,y,,z,, no need to add empty string instead can be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2762) Provide RMAdminCLI args validation for NodeLabelManager operations
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186655#comment-14186655 ] Hadoop QA commented on YARN-2762: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677545/YARN-2762.patch against trunk revision 0398db1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5603//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5603//console This message is automatically generated. Provide RMAdminCLI args validation for NodeLabelManager operations -- Key: YARN-2762 URL: https://issues.apache.org/jira/browse/YARN-2762 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2762.patch All NodeLabel args validation's are done at server side. The same can be done at RMAdminCLI so that unnecessary RPC calls can be avoided. And for the input such as x,y,,z,, no need to add empty string instead can be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186688#comment-14186688 ] Rohith commented on YARN-1813: -- Thanks Tshuyosh for rebasing patch!! Some minor comments 1. I think Logging both messages Logs not available at and Permission denied makes contradictory. Only Permission denied is sufficient. 2. AggregatedLogsBlock.java not changed but it imports AccessControlException. Does it required? 3. If app is submitted but not yet started? Here also log will be displayed as Log aggregation has not completed or is not enabled. I think we can log all the possible reason why logs not available. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0, 2.4.1, 2.5.1 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2601) RMs(HA RMS) can't enter active state
[ https://issues.apache.org/jira/browse/YARN-2601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186695#comment-14186695 ] Rohith commented on YARN-2601: -- [~cindy2012] This will be fixing in YARN-2010. Please follow this jira. RMs(HA RMS) can't enter active state Key: YARN-2601 URL: https://issues.apache.org/jira/browse/YARN-2601 Project: Hadoop YARN Issue Type: Bug Reporter: Cindy Li 2014-09-24 15:04:04,527 DEBUG org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Processing event for application_1409048687352_0552 of type APP_REJECTED 2014-09-24 15:04:04,528 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1409048687352_0552 State change from NEW to FAILED 2014-09-24 15:04:04,528 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppRemovedSchedulerEvent.EventType: APP_REMOVED 2014-09-24 15:04:04,528 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher: Dispatching the event org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEvent.EventType: APP_COMPLETED 2014-09-24 15:04:04,528 DEBUG org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: RMAppManager processing event for application_1409048687352_0552 of type APP_COMPLETED 2014-09-24 15:04:04,528 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=b_hiveperf0 OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=hadoop tried to renew an expired token at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:366) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:6279) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:488) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:923) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2020) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2016) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1650) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2014) APPID=application_1409048687352_0552 2014-09-24 15:04:04,529 DEBUG org.apache.hadoop.service.AbstractService: Service: RMActiveServices entered state STOPPED 2014-09-24 15:04:04,538 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=transitionToActiveTARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=Users [hadoop] are allowed 2014-09-24 15:04:04,539 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:118) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:292) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:116) ... 4 more Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.security.token.SecretManager$InvalidToken: hadoop tried to renew an expired token at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:366) at
[jira] [Commented] (YARN-2762) Provide RMAdminCLI args validation for NodeLabelManager operations
[ https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186703#comment-14186703 ] Rohith commented on YARN-2762: -- test case falures are not related to this fix. Provide RMAdminCLI args validation for NodeLabelManager operations -- Key: YARN-2762 URL: https://issues.apache.org/jira/browse/YARN-2762 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Rohith Assignee: Rohith Priority: Minor Attachments: YARN-2762.patch All NodeLabel args validation's are done at server side. The same can be done at RMAdminCLI so that unnecessary RPC calls can be avoided. And for the input such as x,y,,z,, no need to add empty string instead can be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186725#comment-14186725 ] Hudson commented on YARN-2726: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) YARN-2726. CapacityScheduler should explicitly log when an accessible label has no capacity. Contributed by Wangda Tan (xgong: rev ce1a4419a6c938447a675c416567db56bf9cb29e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Fix For: 2.6.0 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186730#comment-14186730 ] Hudson commented on YARN-2591: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) YARN-2591. Fixed AHSWebServices to return FORBIDDEN(403) if the request user doesn't have access to the history data. Contributed by Zhijie Shen (jianhe: rev c05b581a5522eed499d3ba16af9fa6dc694563f6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AuthorizationException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186720#comment-14186720 ] Hudson commented on YARN-2502: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) YARN-2502. Changed DistributedShell to support node labels. Contributed by Wangda Tan (jianhe: rev f6b963fdfc517429149165e4bb6fb947be6e3c99) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Changes in distributed shell to support specify labels -- Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186726#comment-14186726 ] Hudson commented on YARN-2704: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #726 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/726/]) YARN-2704. Changed ResourceManager to optionally obtain tokens itself for the sake of localization and log-aggregation for long-running services. Contributed by Jian He. (vinodkv: rev a16d022ca4313a41425c8e97841c841a2d6f2f54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority: Critical
[jira] [Updated] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated YARN-2760: -- Attachment: YARN-2760.patch Re-uploading patch to retry after the patching issue was fixed in buildbot. Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186824#comment-14186824 ] Hudson commented on YARN-2726: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) YARN-2726. CapacityScheduler should explicitly log when an accessible label has no capacity. Contributed by Wangda Tan (xgong: rev ce1a4419a6c938447a675c416567db56bf9cb29e) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Fix For: 2.6.0 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186819#comment-14186819 ] Hudson commented on YARN-2502: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) YARN-2502. Changed DistributedShell to support node labels. Contributed by Wangda Tan (jianhe: rev f6b963fdfc517429149165e4bb6fb947be6e3c99) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java Changes in distributed shell to support specify labels -- Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186829#comment-14186829 ] Hudson commented on YARN-2591: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) YARN-2591. Fixed AHSWebServices to return FORBIDDEN(403) if the request user doesn't have access to the history data. Contributed by Zhijie Shen (jianhe: rev c05b581a5522eed499d3ba16af9fa6dc694563f6) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AuthorizationException.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186841#comment-14186841 ] Hadoop QA commented on YARN-2760: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677597/YARN-2760.patch against trunk revision c9bec46. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5604//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5604//console This message is automatically generated. Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186825#comment-14186825 ] Hudson commented on YARN-2704: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1940 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1940/]) YARN-2704. Changed ResourceManager to optionally obtain tokens itself for the sake of localization and log-aggregation for long-running services. Contributed by Jian He. (vinodkv: rev a16d022ca4313a41425c8e97841c841a2d6f2f54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186887#comment-14186887 ] Hudson commented on YARN-2704: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) YARN-2704. Changed ResourceManager to optionally obtain tokens itself for the sake of localization and log-aggregation for long-running services. Contributed by Jian He. (vinodkv: rev a16d022ca4313a41425c8e97841c841a2d6f2f54) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManagerRecovery.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NodeHeartbeatResponsePBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/Context.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestDelegationTokenRenewer.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestLocalCacheDirectoryManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NodeHeartbeatResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/server/api/protocolrecords/TestProtocolRecords.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Priority: Critical
[jira] [Commented] (YARN-2726) CapacityScheduler should explicitly log when an accessible label has no capacity
[ https://issues.apache.org/jira/browse/YARN-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186886#comment-14186886 ] Hudson commented on YARN-2726: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) YARN-2726. CapacityScheduler should explicitly log when an accessible label has no capacity. Contributed by Wangda Tan (xgong: rev ce1a4419a6c938447a675c416567db56bf9cb29e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/CHANGES.txt CapacityScheduler should explicitly log when an accessible label has no capacity Key: YARN-2726 URL: https://issues.apache.org/jira/browse/YARN-2726 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Reporter: Phil D'Amore Assignee: Wangda Tan Priority: Minor Fix For: 2.6.0 Attachments: YARN-2726-20141023-1.patch, YARN-2726-20141023-2.patch Given: - Node label defined: test-label - Two queues defined: a, b - label accessibility and and capacity defined as follows (properties abbreviated for readability): root.a.accessible-node-labels = test-label root.a.accessible-node-labels.test-label.capacity = 100 If you restart the RM or do a 'rmadmin -refreshQueues' you will get a stack trace with the following error buried within: Illegal capacity of -1.0 for label=test-label in queue=root.b This of course occurs because test-label is accessible to b due to inheritance from the root, and -1 is the UNDEFINED value. To my mind this might not be obvious to the admin, and the error message which results does not help guide someone to the source of the issue. I propose that this situation be updated so that when the capacity on an accessible label is undefined, it is explicitly called out instead of falling through to the illegal capacity check. Something like: {code} if (capacity == UNDEFINED) { throw new IllegalArgumentException(Configuration issue: + label= + label + is accessible from queue= + queue + but has no capacity set.); } {code} I'll leave it to better judgement than mine as to whether I'm throwing the appropriate exception there. I think this check should be added to both getNodeLabelCapacities and getMaximumNodeLabelCapacities in CapacitySchedulerConfiguration.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186892#comment-14186892 ] Hudson commented on YARN-2591: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) YARN-2591. Fixed AHSWebServices to return FORBIDDEN(403) if the request user doesn't have access to the history data. Contributed by Zhijie Shen (jianhe: rev c05b581a5522eed499d3ba16af9fa6dc694563f6) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/authorize/AuthorizationException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2502) Changes in distributed shell to support specify labels
[ https://issues.apache.org/jira/browse/YARN-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186881#comment-14186881 ] Hudson commented on YARN-2502: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1915 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1915/]) YARN-2502. Changed DistributedShell to support node labels. Contributed by Wangda Tan (jianhe: rev f6b963fdfc517429149165e4bb6fb947be6e3c99) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/test/java/org/apache/hadoop/yarn/applications/distributedshell/TestDistributedShell.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java * hadoop-yarn-project/CHANGES.txt Changes in distributed shell to support specify labels -- Key: YARN-2502 URL: https://issues.apache.org/jira/browse/YARN-2502 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2502-20141009.1.patch, YARN-2502-20141009.2.patch, YARN-2502-20141013.1.patch, YARN-2502-20141017-1.patch, YARN-2502-20141017-2.patch, YARN-2502-20141027-2.patch, YARN-2502.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2742: -- Assignee: Wei Yan FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186983#comment-14186983 ] Sangjin Lee commented on YARN-2742: --- Thanks for the patch [~ywskycn]! +1 in terms of using \s* to cover these cases. I'm comfortable with also making the unit match case-insensitive, but I'd like to hear others' thoughts on that. FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2760: --- Target Version/s: 2.6.0 Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-1964: -- Attachment: YARN-1964.patch Patch with merge conflicts fixed. Docs pointing back to sequenceiq/hadoop-docker:2.4.1 Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186991#comment-14186991 ] Sangjin Lee commented on YARN-2755: --- [~l201514], the patch looks good. It might be good to add a small unit test to demonstrate the bug/fix. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187014#comment-14187014 ] Hadoop QA commented on YARN-1964: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677620/YARN-1964.patch against trunk revision 58c0bb9. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5605//console This message is automatically generated. Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2764) counters.LimitExceededException shouldn't abort AsyncDispatcher
Ted Yu created YARN-2764: Summary: counters.LimitExceededException shouldn't abort AsyncDispatcher Key: YARN-2764 URL: https://issues.apache.org/jira/browse/YARN-2764 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.5.1 Reporter: Ted Yu I saw the following in container log: {code} 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attemptattempt_1414221548789_0023_r_03_0 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1414221548789_0023_r_03 Task Transitioned from RUNNING to SUCCEEDED 2014-10-25 10:28:55,052 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 24 2014-10-25 10:28:55,053 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1414221548789_0023Job Transitioned from RUNNING to COMMITTING 2014-10-25 10:28:55,054 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT 2014-10-25 10:28:55,177 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:101) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:108) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:78) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:95) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:106) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:203) at org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:348) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1754) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1737) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1718) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:1089) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2049) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$CommitSucceededTransition.transition(JobImpl.java:2045) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) 2014-10-25 10:28:55,185 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. {code} Counter limit was exceeded when JobFinishedEvent was created. Better handling of LimitExceededException should be provided so that AsyncDispatcher can continue functioning. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187042#comment-14187042 ] Wei Yan commented on YARN-2742: --- Thanks for the comments, [~sjlee0]. Any comments, [~kasha]? FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2761) potential race condition in SchedulingPolicy
[ https://issues.apache.org/jira/browse/YARN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187058#comment-14187058 ] Karthik Kambatla commented on YARN-2761: I am all up for fixing this, but the race itself shouldn't be a big deal. The worst thing that can happen is - multiple threads would create multiple instances of the policy and only one of them eventually goes to the map, the rest would get GCed eventually. [~zhiguohong] - do you agree or am I missing something? potential race condition in SchedulingPolicy Key: YARN-2761 URL: https://issues.apache.org/jira/browse/YARN-2761 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor reported by findbug. In SchedulingPolicy.getInstance, ConcurrentHashMap.get and ConcurrentHashMap.put is called. These two operations together should be atomic, but using ConcurrentHashMap doesn't guarantee this. {code} public static SchedulingPolicy getInstance(Class? extends SchedulingPolicy clazz) { SchedulingPolicy policy = instances.get(clazz); if (policy == null) { policy = ReflectionUtils.newInstance(clazz, null); instances.put(clazz, policy); } return policy; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187061#comment-14187061 ] Hudson commented on YARN-2760: -- FAILURE: Integrated in Hadoop-trunk-Commit #6368 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6368/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2761) potential race condition in SchedulingPolicy
[ https://issues.apache.org/jira/browse/YARN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187073#comment-14187073 ] Wei Yan commented on YARN-2761: --- Good catch, [~zhiguohong]. potential race condition in SchedulingPolicy Key: YARN-2761 URL: https://issues.apache.org/jira/browse/YARN-2761 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor reported by findbug. In SchedulingPolicy.getInstance, ConcurrentHashMap.get and ConcurrentHashMap.put is called. These two operations together should be atomic, but using ConcurrentHashMap doesn't guarantee this. {code} public static SchedulingPolicy getInstance(Class? extends SchedulingPolicy clazz) { SchedulingPolicy policy = instances.get(clazz); if (policy == null) { policy = ReflectionUtils.newInstance(clazz, null); instances.put(clazz, policy); } return policy; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187081#comment-14187081 ] Abin Shahab commented on YARN-1964: --- This says it passed: https://builds.apache.org/job/PreCommit-YARN-Build/5605/artifact/patchprocess/trunkJavacWarnings.txt/*view*/ Is there another patch that's making it fail? https://issues.apache.org/jira/browse/HADOOP-10926 ? Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2753: - Summary: Fix potential issues and code clean up for *NodeLabelsManager (was: Shouldn't change the value in labelCollections if the key already exists and potential NPE at CommonNodeLabelsManager.) Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2753: - Description: Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). was: CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. because when a Node is created, Node.labels can be null. In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2754) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java.
[ https://issues.apache.org/jira/browse/YARN-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2754: - Issue Type: Bug (was: Sub-task) Parent: (was: YARN-2492) addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. --- Key: YARN-2754 URL: https://issues.apache.org/jira/browse/YARN-2754 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2754.000.patch addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187114#comment-14187114 ] Wangda Tan commented on YARN-2753: -- [~zxu], I've changed the title and description, I suggest to merge other RMNodeLabelsManager fixes here also. Basically they're same module, and close others as duplicated. Please comment such issues here. Which we can do quick review and commit this easier.. Thanks, Wangda Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
[ https://issues.apache.org/jira/browse/YARN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2757: - Priority: Minor (was: Major) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. --- Key: YARN-2757 URL: https://issues.apache.org/jira/browse/YARN-2757 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2757.000.patch pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. since we check the nodeLabels null at {code} if (!str.trim().isEmpty() (nodeLabels == null || !nodeLabels.contains(str.trim( { return false; } {code} We should also check nodeLabels null at {code} if (!nodeLabels.isEmpty()) { return false; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
[ https://issues.apache.org/jira/browse/YARN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187131#comment-14187131 ] Wangda Tan commented on YARN-2757: -- [~zxu], This method is invoked by RMNodeLabelsManager#getLabelsOnNode, and that method will never return null, so I changed priority of this issue to minor, please boost it if you don't agree. potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. --- Key: YARN-2757 URL: https://issues.apache.org/jira/browse/YARN-2757 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2757.000.patch pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. since we check the nodeLabels null at {code} if (!str.trim().isEmpty() (nodeLabels == null || !nodeLabels.contains(str.trim( { return false; } {code} We should also check nodeLabels null at {code} if (!nodeLabels.isEmpty()) { return false; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2279) Add UTs to cover timeline server authentication
[ https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187139#comment-14187139 ] Xuan Gong commented on YARN-2279: - +1 Looks good. Will commit it after Jenkins give +1 Add UTs to cover timeline server authentication --- Key: YARN-2279 URL: https://issues.apache.org/jira/browse/YARN-2279 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: test Attachments: YARN-2279.1.patch Currently, timeline server authentication is lacking unit tests. We have to verify each incremental patch manually. It's good to add some unit tests here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2279) Add UTs to cover timeline server authentication
[ https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187190#comment-14187190 ] Hadoop QA commented on YARN-2279: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676042/YARN-2279.1.patch against trunk revision ade3727. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5606//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5606//console This message is automatically generated. Add UTs to cover timeline server authentication --- Key: YARN-2279 URL: https://issues.apache.org/jira/browse/YARN-2279 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: test Attachments: YARN-2279.1.patch Currently, timeline server authentication is lacking unit tests. We have to verify each incremental patch manually. It's good to add some unit tests here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1813: - Attachment: YARN-1813.6.patch Thanks for your review, Rohith! Updated: 1. Changed a log message to Permission denied. : /path/to/dir 2. Removed needless change in AggregatedLogsBlock. 3. Updated log message in {{logDirNotExist}}. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0, 2.4.1, 2.5.1 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch, YARN-1813.6.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187219#comment-14187219 ] sidharta seethana commented on YARN-1964: - This commit in branch-2.6 : https://github.com/apache/hadoop/commit/29d0164e which changed the signature of an abstract function in ContainerExecutor. It looks like your latest patch fixes this, though. We'll take a look, thanks. Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2279) Add UTs to cover timeline server authentication
[ https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187228#comment-14187228 ] Xuan Gong commented on YARN-2279: - committed into trunk/branch-2/branch-2.6. Thanks, zhijie ! Add UTs to cover timeline server authentication --- Key: YARN-2279 URL: https://issues.apache.org/jira/browse/YARN-2279 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: test Attachments: YARN-2279.1.patch Currently, timeline server authentication is lacking unit tests. We have to verify each incremental patch manually. It's good to add some unit tests here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2279) Add UTs to cover timeline server authentication
[ https://issues.apache.org/jira/browse/YARN-2279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2279: Fix Version/s: 2.6.0 Add UTs to cover timeline server authentication --- Key: YARN-2279 URL: https://issues.apache.org/jira/browse/YARN-2279 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Labels: test Fix For: 2.6.0 Attachments: YARN-2279.1.patch Currently, timeline server authentication is lacking unit tests. We have to verify each incremental patch manually. It's good to add some unit tests here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2737) Misleading msg in LogCLI when app is not successfully submitted
[ https://issues.apache.org/jira/browse/YARN-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187253#comment-14187253 ] Tsuyoshi OZAWA commented on YARN-2737: -- [~jianhe], [~rohithsharma] reviewed a patch on YARN-1813. It includes the comment about this issue. Could you take a look? Misleading msg in LogCLI when app is not successfully submitted Key: YARN-2737 URL: https://issues.apache.org/jira/browse/YARN-2737 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Jian He Assignee: Rohith {{LogCLiHelpers#logDirNotExist}} prints msg {{Log aggregation has not completed or is not enabled.}} if the app log file doesn't exist. This is misleading because if the application is not submitted successfully. Clearly, we won't have logs for this application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187264#comment-14187264 ] Tsuyoshi OZAWA commented on YARN-2742: -- [~ywskycn], thanks for contribution. How about adding a test case to include trailing space like 1024 mb, 4 core ? FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1813) Better error message for yarn logs when permission denied
[ https://issues.apache.org/jira/browse/YARN-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187293#comment-14187293 ] Hadoop QA commented on YARN-1813: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677648/YARN-1813.6.patch against trunk revision 0d3e7e2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5607//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5607//console This message is automatically generated. Better error message for yarn logs when permission denied --- Key: YARN-1813 URL: https://issues.apache.org/jira/browse/YARN-1813 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.3.0, 2.4.1, 2.5.1 Reporter: Andrew Wang Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1813.1.patch, YARN-1813.2.patch, YARN-1813.2.patch, YARN-1813.3.patch, YARN-1813.4.patch, YARN-1813.5.patch, YARN-1813.6.patch I ran some MR jobs as the hdfs user, and then forgot to sudo -u when grabbing the logs. yarn logs prints an error message like the following: {noformat} [andrew.wang@a2402 ~]$ yarn logs -applicationId application_1394482121761_0010 14/03/10 16:05:10 INFO client.RMProxy: Connecting to ResourceManager at a2402.halxg.cloudera.com/10.20.212.10:8032 Logs not available at /tmp/logs/andrew.wang/logs/application_1394482121761_0010 Log aggregation has not completed or is not enabled. {noformat} It'd be nicer if it said Permission denied or AccessControlException or something like that instead, since that's the real issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187295#comment-14187295 ] Hadoop QA commented on YARN-2758: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677493/YARN-2758.1.patch against trunk revision 0d3e7e2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5608//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5608//console This message is automatically generated. Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2742: -- Attachment: YARN-2742-2.patch Thanks, [~ozawa]. Update a patch to add that testcase. FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: YARN-2755.v2.patch NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: YARN-2755.v3.patch NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187387#comment-14187387 ] Xuan Gong commented on YARN-2758: - +1 Looks good to me. Will commit Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187401#comment-14187401 ] Hudson commented on YARN-2758: -- FAILURE: Integrated in Hadoop-trunk-Commit #6372 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6372/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187416#comment-14187416 ] Xuan Gong commented on YARN-2758: - Committed to trunk/branch-2/branch-2.6. Thanks, zhijie! Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187426#comment-14187426 ] Hadoop QA commented on YARN-2742: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677668/YARN-2742-2.patch against trunk revision 371a3b8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5609//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5609//console This message is automatically generated. FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187429#comment-14187429 ] Hadoop QA commented on YARN-2755: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677674/YARN-2755.v2.patch against trunk revision e226b5b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5610//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5610//console This message is automatically generated. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187428#comment-14187428 ] Tsuyoshi OZAWA commented on YARN-2742: -- LGTM. FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187432#comment-14187432 ] Hadoop QA commented on YARN-2755: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677678/YARN-2755.v3.patch against trunk revision e226b5b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5611//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5611//console This message is automatically generated. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187433#comment-14187433 ] Sangjin Lee commented on YARN-2742: --- +1 FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187455#comment-14187455 ] Zhijie Shen commented on YARN-2741: --- +1 LGTM. The test case will verify the drive letter will not be skipped on both Linux and Windows. Will commit the patch. Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187497#comment-14187497 ] Hudson commented on YARN-2741: -- FAILURE: Integrated in Hadoop-trunk-Commit #6374 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6374/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.6.0 Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue infos
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187521#comment-14187521 ] Xuan Gong commented on YARN-2647: - Thanks for the patch. [~sunilg]. The patch looks good overall. I only have one comment. This check seems unnecessary {code} if (args.length 0 args[0].equalsIgnoreCase(QUEUE)) { } else { syserr.println(Invalid Command usage. Command must start with 'queue'); return exitCode; } {code} If we did not call command: yarn queue, this class will not be used. So, this check is not necessary. We do have such check in ApplicationCLI. Because command: yarn application and command: yarn applicationattempt will use the same ApplicationCLI class. So, they need to do this. Also, could you create a patch for branch-2, please ? The latest patch can not apply to branch-2. Add yarn queue CLI to get queue infos - Key: YARN-2647 URL: https://issues.apache.org/jira/browse/YARN-2647 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-2647.patch, 0002-YARN-2647.patch, 0003-YARN-2647.patch, 0004-YARN-2647.patch, 0005-YARN-2647.patch, 0006-YARN-2647.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: YARN-2753.004.patch Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Description: Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. * use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. ** When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. was: Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. * use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. ** When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2556: --- Attachment: yarn2556_wip.patch current work in progress patch. implement the measure for iorate and transaction rate Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, yarn2556_wip.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187575#comment-14187575 ] zhihai xu commented on YARN-2753: - Hi [~leftnoteasy], thanks for your suggestion. I also merged YARN-2754 and YARN-2756 to this Jira. zhihai Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. * use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. ** When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated YARN-1964: -- Attachment: YARN-1964.patch Fixed rebase issue. Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187579#comment-14187579 ] zhihai xu commented on YARN-2753: - The new patch YARN-2753.004.patch will include 4 patches:YARN-2753, YARN-2759, YARN-2754 and YARN-2756. Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. * use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. ** When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187584#comment-14187584 ] Wangda Tan commented on YARN-2753: -- [~zxu], I agree with other fixes but this one: bq. use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. The Resources.none() is used to do checking such as if (Resources.greaterThan(resource, Resources.none())) { .. do something }. Even if in nowadays, it will be replaced, but you cannot say in the future, some guys may write like node.resource.setMemory(...), basically I think it's a bad style. That will throw runtime exception and destroy YARN daemons, comparing to memory it can save, the risk is much more serious, do you agree? Thanks, Wangda Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. * use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. ** When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187587#comment-14187587 ] Abin Shahab commented on YARN-1964: --- Fixed. I had tested the patch on my box, and that passed. Not sure how it passed when there was such an obvious error. On Tue, Oct 28, 2014 at 2:39 PM, sidharta seethana (JIRA) j...@apache.org Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, YARN-1964.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
[ https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reopened YARN-2756: - use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. --- Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2756.000.patch use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
[ https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187595#comment-14187595 ] zhihai xu commented on YARN-2756: - Separate this JIRA from YARN-2753 for discussion. use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. --- Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2756.000.patch use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Description: Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. was: Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. * use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. ** When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: YARN-2753.005.patch Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, YARN-2753.005.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187619#comment-14187619 ] zhihai xu commented on YARN-2753: - I remove YARN-2756 from this JIRA YARN-2753, so we can discuss issue YARN-2756 separately. The new patch YARN-2753.005.patch will include 3 patches:YARN-2753, YARN-2759 and YARN-2754. Hi [~leftnoteasy] thanks to review the patch. I will move the discussion to YARN-2756. zhihai Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, YARN-2753.005.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2503: - Attachment: YARN-2503-20141028-1.patch Thanks [~jianhe] for comment, updated a new patch addressed your comments. Changes in RM Web UI to better show labels to end users --- Key: YARN-2503 URL: https://issues.apache.org/jira/browse/YARN-2503 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, YARN-2503.patch Include but not limited to: - Show labels of nodes in RM/nodes page - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue infos
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187648#comment-14187648 ] Craig Welch commented on YARN-2647: --- [~sunilg] in the yarn file you can drop the (s) from information(s) ( information is singular and plural :-) ) In QueueCLI.listQueues I think it is safe to not check for a null queue list (not entirely sure), but in printQueueInfo I do think you need to check that the nodeLabels list is not null Otherwise, +1 from me Add yarn queue CLI to get queue infos - Key: YARN-2647 URL: https://issues.apache.org/jira/browse/YARN-2647 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-2647.patch, 0002-YARN-2647.patch, 0003-YARN-2647.patch, 0004-YARN-2647.patch, 0005-YARN-2647.patch, 0006-YARN-2647.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
[ https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187660#comment-14187660 ] zhihai xu commented on YARN-2756: - Hi [~leftnoteasy] bq. but you cannot say in the future, some guys may write like node.resource.setMemory(...), basically I think it's a bad style. That will throw runtime exception and destroy YARN daemons, comparing to memory it can save, the risk is much more serious, do you agree? IMO we should not permit people to call node.resource.setMemory(...) to change the node memory when the node is not running. Currently the only way to change the node memory from scheduler is by activateNode/deactivateNode. The patch will force this constraint: when the node is not running, the resource in the node can't be change. We can only change the resource in the node when the node is running. In the future, if we really want to change the rule/constraint, we can change the implementation/architecture. But I don't see we need change the rule/constraint now or in the near future. Saving memory is the second benefit for this patch. thanks zhihai use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. --- Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Improvement Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2756.000.patch use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
[ https://issues.apache.org/jira/browse/YARN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187672#comment-14187672 ] zhihai xu commented on YARN-2757: - Hi [~leftnoteasy], thanks to review the patch. I agree to change the priority to minor. I just want to make sure the code is consistent either both check the null pointer or both don't check the null pointer. zhihai potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. --- Key: YARN-2757 URL: https://issues.apache.org/jira/browse/YARN-2757 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2757.000.patch pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. since we check the nodeLabels null at {code} if (!str.trim().isEmpty() (nodeLabels == null || !nodeLabels.contains(str.trim( { return false; } {code} We should also check nodeLabels null at {code} if (!nodeLabels.isEmpty()) { return false; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2556: --- Attachment: (was: yarn2556_wip.patch) Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2556: --- Attachment: YARN-2556-WIP.patch Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2757) potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels.
[ https://issues.apache.org/jira/browse/YARN-2757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2757: Issue Type: Sub-task (was: Bug) Parent: YARN-2492 potential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. --- Key: YARN-2757 URL: https://issues.apache.org/jira/browse/YARN-2757 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2757.000.patch pontential NPE in checkNodeLabelExpression of SchedulerUtils for nodeLabels. since we check the nodeLabels null at {code} if (!str.trim().isEmpty() (nodeLabels == null || !nodeLabels.contains(str.trim( { return false; } {code} We should also check nodeLabels null at {code} if (!nodeLabels.isEmpty()) { return false; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2756) use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory.
[ https://issues.apache.org/jira/browse/YARN-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2756: Issue Type: Sub-task (was: Improvement) Parent: YARN-2492 use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. --- Key: YARN-2756 URL: https://issues.apache.org/jira/browse/YARN-2756 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Attachments: YARN-2756.000.patch use static variable (Resources.none()) for not-running Node.resource in CommonNodeLabelsManager to save memory. When a Node is not activated, the resource is never used, When a Node is activated, a new resource will be assigned to it in RMNodeLabelsManager#activateNode (nm.resource = resource) So it would be better to use static variable Resources.none() instead of allocating a new variable(Resource.newInstance(0, 0)) for each node deactivation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-2698: Assignee: Wangda Tan (was: Mayank Bansal) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187691#comment-14187691 ] Wangda Tan commented on YARN-2698: -- [~mayank_bansal], taking this over, we need get this done today, will upload a patch soon. Thanks, Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14187703#comment-14187703 ] Hadoop QA commented on YARN-2753: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677715/YARN-2753.004.patch against trunk revision 8984e9b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5612//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5612//console This message is automatically generated. Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, YARN-2753.005.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)