[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182495#comment-14182495 ] Hadoop QA commented on YARN-2703: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676733/YARN-2703.3.patch against trunk revision 071c925. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5537//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5537//console This message is automatically generated. Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2293) Scoring for NMs to identify a better candidate to launch AMs
[ https://issues.apache.org/jira/browse/YARN-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182514#comment-14182514 ] sna commented on YARN-2293: --- Have you realized the target? Scoring for NMs to identify a better candidate to launch AMs Key: YARN-2293 URL: https://issues.apache.org/jira/browse/YARN-2293 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager, resourcemanager Reporter: Sunil G Assignee: Sunil G Container exit status from NM is giving indications of reasons for its failure. Some times, it may be because of container launching problems in NM. In a heterogeneous cluster, some machines with weak hardware may cause more failures. It will be better not to launch AMs there more often. Also I would like to clear that container failures because of buggy job should not result in decreasing score. As mentioned earlier, based on exit status if a scoring mechanism is added for NMs in RM, then NMs with better scores can be given for launching AMs. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.
[ https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182526#comment-14182526 ] Matteo Mazzucchelli commented on YARN-2664: --- Thanks Carlo for the hints. I'm actually coding the solution about the data variation. I will be able to show you some code in a couple of days. I propose [nvd3|http://nvd3.org/] as js library. Improve RM webapp to expose info about reservations. Key: YARN-2664 URL: https://issues.apache.org/jira/browse/YARN-2664 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Carlo Curino Attachments: PlannerPage_screenshot.pdf, YARN-2664.patch YARN-1051 provides a new functionality in the RM to ask for reservation on resources. Exposing this through the webapp GUI is important. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2678) Recommended improvements to Yarn Registry
[ https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2678: - Attachment: YARN-2678-003.patch patch against trunk; no obvious differences so reason of previous failure unknown (unless it tried to apply the .pdf) Recommended improvements to Yarn Registry - Key: YARN-2678 URL: https://issues.apache.org/jira/browse/YARN-2678 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Steve Loughran Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, YARN-2678-003.patch, yarnregistry.pdf In the process of binding to Slider AM from Slider agent python code here are some of the items I stumbled upon and would recommend as improvements. This is how the Slider's registry looks today - {noformat} jsonservicerec{ description : Slider Application Master, external : [ { api : org.apache.slider.appmaster, addressType : host/port, protocolType : hadoop/protobuf, addresses : [ [ c6408.ambari.apache.org, 34837 ] ] }, { api : org.apache.http.UI, addressType : uri, protocolType : webui, addresses : [ [ http://c6408.ambari.apache.org:43314; ] ] }, { api : org.apache.slider.management, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ] }, { api : org.apache.slider.publisher, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ] }, { api : org.apache.slider.registry, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ] }, { api : org.apache.slider.publisher.configurations, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ] } ], internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ] }, { api : org.apache.slider.agents.oneway, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ] } ], yarn:persistence : application, yarn:id : application_1412974695267_0015 } {noformat} Recommendations: 1. I would suggest to either remove the string {color:red}jsonservicerec{color} or if it is desirable to have a non-null data at all times then loop the string into the json structure as a top-level attribute to ensure that the registry data is always a valid json document. 2. The {color:red}addresses{color} attribute is currently a list of list. I would recommend to convert it to a list of dictionary objects. In the dictionary object it would be nice to have the host and port portions of objects of addressType uri as separate key-value pairs to avoid parsing on the client side. The URI should also be retained as a key say uri to avoid clients trying to generate it by concatenating host, port, resource-path, etc. Here is a proposed structure - {noformat} { ... internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ { uri : https://c6408.ambari.apache.org:46958/ws/v1/slider/agents;, host : c6408.ambari.apache.org, port: 46958 } ] } ], } {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-2712: Assignee: Tsuyoshi OZAWA Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2738) Add FairReservationSystem for FairScheduler
Anubhav Dhoot created YARN-2738: --- Summary: Add FairReservationSystem for FairScheduler Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Reporter: Anubhav Dhoot Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot reassigned YARN-2738: --- Assignee: Anubhav Dhoot Add FairReservationSystem for FairScheduler --- Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2690: Attachment: YARN-2690.003.patch Added comments to ReservationSchedulerConfiguration. Prefer a getter method to a protected variable to make it easy to decouple the two classes. Let me know if you feel strongly about it Make ReservationSystem and its dependent classes independent of Scheduler type Key: YARN-2690 URL: https://issues.apache.org/jira/browse/YARN-2690 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2690.001.patch, YARN-2690.002.patch, YARN-2690.002.patch, YARN-2690.003.patch A lot of common reservation classes depend on CapacityScheduler and specifically its configuration. This jira is to make them ready for other Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2571: - Attachment: YARN-2571-005.patch Patch 005. # in sync with trunk rev 0942c9 # moved the path creation logic in the {{RegistryAdminService.start()}} operation to being async # tests enhanced to await completion of async setup. The reason to move to async is to eliminate the impact of ZK connection problems on RM startup. With this patch all RM-registry operations are executed in a single executor thread, which implicitly queues the requests in an ordered sequence. There is one side-effect: if there is a failure in registry startup due to security problems, this is not propagated to the RM —i.e. it does not cause RM startup to fail. It will however be visible to users of the registry who themselves are likely to have auth problems. RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182685#comment-14182685 ] Anubhav Dhoot commented on YARN-2738: - Depends on the refactoring done in YARN-2690 Add FairReservationSystem for FairScheduler --- Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Dhoot updated YARN-2738: Attachment: YARN-2738.001.patch Adds support in FairScheduler xml for reading queue reservation configuration Updates AllocationFile changes and unit tests Adds FairReservationSystem and its unit tests Add FairReservationSystem for FairScheduler --- Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2738.001.patch Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2712: - Attachment: YARN-2712.1.patch Attaching a first patch including following changes: 1. Adding tests about FSQueue({{checkFSQueue}}). 2. Moving headroom tests into check*Queue. 3. Renamed asserteMetrics to assertMetrics. 4. Calling {{updateRootQueueMetrics}} explicitly in {{FairScheduler#update}} because I found a unexpected behavior about rootQueue while writing code - {{updateRootQueueMetrics}} isn't called until RMNode is registered, updated, removed, and added. Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2712.1.patch TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182700#comment-14182700 ] Hudson commented on YARN-2209: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #722 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/722/]) YARN-2209. Replaced AM resync/shutdown command with corresponding exceptions and made related MR changes. Contributed by Jian He. (zjshen: rev 0f3b6900be1a3b2e4624f31f84656f4a32dadce9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AMCommand.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java Replace AM resync/shutdown command with corresponding exceptions Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch, YARN-2209.7.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2682) WindowsSecureContainerExecutor should not depend on DefaultContainerExecutor#getFirstApplicationDir.
[ https://issues.apache.org/jira/browse/YARN-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182706#comment-14182706 ] Hudson commented on YARN-2682: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #722 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/722/]) Updated CHANGES.txt to move YARN-2682 to branch-2.6 (jianhe: rev 071c925c7dffbb825884a5df5a0104e7793b30fc) * hadoop-yarn-project/CHANGES.txt WindowsSecureContainerExecutor should not depend on DefaultContainerExecutor#getFirstApplicationDir. - Key: YARN-2682 URL: https://issues.apache.org/jira/browse/YARN-2682 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.6.0 Attachments: YARN-2682.000.patch, YARN-2682.001.patch DefaultContainerExecutor won't use getFirstApplicationDir any more. But we can't delete getFirstApplicationDir in DefaultContainerExecutor because WindowsSecureContainerExecutor uses it. We should move getFirstApplicationDir function from DefaultContainerExecutor to WindowsSecureContainerExecutor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2725) Adding test cases of retrying requests about ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2725: - Attachment: YARN-2725.1.patch Attaching a first patch. Adding test cases of retrying requests about ZKRMStateStore --- Key: YARN-2725 URL: https://issues.apache.org/jira/browse/YARN-2725 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Attachments: YARN-2725.1.patch YARN-2721 found a race condition for ZK-specific retry semantics. We should add tests about the case of retry requests to ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182730#comment-14182730 ] Hadoop QA commented on YARN-2690: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676881/YARN-2690.003.patch against trunk revision 0942c99. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 7 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5538//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5538//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5538//console This message is automatically generated. Make ReservationSystem and its dependent classes independent of Scheduler type Key: YARN-2690 URL: https://issues.apache.org/jira/browse/YARN-2690 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2690.001.patch, YARN-2690.002.patch, YARN-2690.002.patch, YARN-2690.003.patch A lot of common reservation classes depend on CapacityScheduler and specifically its configuration. This jira is to make them ready for other Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-2647: -- Attachment: 0001-YARN-2647.patch Hi [~gp.leftnoteasy] Please find an initial patch. I used below commands for now, pls share your thoughts. a. {code} yarn queue -listall {code} Lists the complete info from all queues. b. {code} yarn queue -list QueueName {code} Lists the information from a given Queue Name. c. {code} yarn queue -list QueueName -showNodeLabels {code} Lists the Node Labels of the given queue d. {code} yarn queue -showAcls {code} Lists the ACLs from all queues for the current user Finally, I will be printing information's as given below {code} Queue Name : queueA Node Labels : JDK_7,GPU State : RUNNING Capacity : 40.0% Current Capacity : 50.0% {code} Kindly share your thoughts.. Add yarn queue CLI to get queue info including labels of such queue --- Key: YARN-2647 URL: https://issues.apache.org/jira/browse/YARN-2647 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-2647.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182745#comment-14182745 ] Hadoop QA commented on YARN-2712: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676888/YARN-2712.1.patch against trunk revision 0942c99. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5539//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5539//console This message is automatically generated. Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2712.1.patch TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182760#comment-14182760 ] Tsuyoshi OZAWA commented on YARN-2712: -- The test failure looks not related to the patch. [~kkambatl], [~jianhe], do you mind taking a look, please? Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2712.1.patch TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2725) Adding test cases of retrying requests about ZKRMStateStore
[ https://issues.apache.org/jira/browse/YARN-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182761#comment-14182761 ] Hadoop QA commented on YARN-2725: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676890/YARN-2725.1.patch against trunk revision 0942c99. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5540//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5540//console This message is automatically generated. Adding test cases of retrying requests about ZKRMStateStore --- Key: YARN-2725 URL: https://issues.apache.org/jira/browse/YARN-2725 Project: Hadoop YARN Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2725.1.patch YARN-2721 found a race condition for ZK-specific retry semantics. We should add tests about the case of retry requests to ZK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed
[ https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182795#comment-14182795 ] Mit Desai commented on YARN-2724: - Sorry for taking too long. Got stuck in something else. YARN-2724.5.patch looks good to me. +1 (non-binding) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed - Key: YARN-2724 URL: https://issues.apache.org/jira/browse/YARN-2724 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, YARN-2724.4.patch, YARN-2724.5.patch Look into the log output snippet. It looks like there is an issue during aggregation when an unreadable file is encountered. Likely, this results in bad encoding. {noformat} LogType: command-13.json LogLength: 13934 Log Contents: Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json (Permission denied)command-3.json13983Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json (Permission denied) errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 sys=0.01, real=0.05 secs] 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs] 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, real=0.04 secs] {noformat} Specifically, look at the text after the exception text. There should be two more entries for log files but none exist. This is likely due to the fact that command-13.json is expected to be of length 13934 but its is not as the file was never read. I think, it should have been {noformat} LogType: command-13.json LogLength: Length of the exception text Log Contents: Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json (Permission denied)command-3.json13983Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json (Permission denied) {noformat} {noformat} LogType: errors-3.txt LogLength:0 Log Contents: {noformat} {noformat} LogType:gc.log LogLength:??? Log Contents: ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20141024-1.patch Hi [~wangda] bq. Add a reject node labels list in NodeHeartbeatRequest – we may not have to handle this list for now. But we can keep it on the interface ??rmContext.getNodeLabelManager().replaceLabelsOnNode(labelUpdate);?? currently throws exception and hence felt that this message should be sent back to NM instead of sending the invalid list (which requires interface changes in NodeLabelsManager). So I was thinking of propagating Exception's errorMsg to NM by makiing use of NodeHeartbeatResponse.DiagnosticsMessage. and will log this in NM Have handled in this way please check the code ResourceTrackerService and NodeStatusUpdaterIMPL.(also contains other changes which you mentioned yest) Allow admin specify labels in each NM (Distributed configuration) - Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml or using script suggested by [~aw]) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2729: Attachment: YARN-2729.20141024-1.patch Hi [~wangda] As per our discussion I have done the scriptNodeLabelsPRovider similar to the NodeHealthCheckerService. And moved the yarn configuration changes related to the script into this jira itself. Please review Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup --- Key: YARN-2729 URL: https://issues.apache.org/jira/browse/YARN-2729 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182814#comment-14182814 ] Hudson commented on YARN-2209: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1911 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1911/]) YARN-2209. Replaced AM resync/shutdown command with corresponding exceptions and made related MR changes. Contributed by Jian He. (zjshen: rev 0f3b6900be1a3b2e4624f31f84656f4a32dadce9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/AMCommand.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterLauncher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/async/impl/TestAMRMClientAsync.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/exceptions/ApplicationAttemptNotFoundException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerRequestor.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAuditLogger.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRMRPCResponseId.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAMRMClient.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/ResourceCalculator.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/local/LocalContainerAllocator.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java Replace AM resync/shutdown command with corresponding exceptions Key: YARN-2209 URL: https://issues.apache.org/jira/browse/YARN-2209 Project: Hadoop YARN Issue Type: Improvement Reporter: Jian He Assignee: Jian He Fix For: 2.6.0 Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, YARN-2209.4.patch, YARN-2209.5.patch, YARN-2209.6.patch, YARN-2209.6.patch, YARN-2209.7.patch YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate application to re-register on RM restart. we should do the same for AMS#allocate call also. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2682) WindowsSecureContainerExecutor should not depend on DefaultContainerExecutor#getFirstApplicationDir.
[ https://issues.apache.org/jira/browse/YARN-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182820#comment-14182820 ] Hudson commented on YARN-2682: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1911 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1911/]) Updated CHANGES.txt to move YARN-2682 to branch-2.6 (jianhe: rev 071c925c7dffbb825884a5df5a0104e7793b30fc) * hadoop-yarn-project/CHANGES.txt WindowsSecureContainerExecutor should not depend on DefaultContainerExecutor#getFirstApplicationDir. - Key: YARN-2682 URL: https://issues.apache.org/jira/browse/YARN-2682 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.5.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Minor Fix For: 2.6.0 Attachments: YARN-2682.000.patch, YARN-2682.001.patch DefaultContainerExecutor won't use getFirstApplicationDir any more. But we can't delete getFirstApplicationDir in DefaultContainerExecutor because WindowsSecureContainerExecutor uses it. We should move getFirstApplicationDir function from DefaultContainerExecutor to WindowsSecureContainerExecutor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2646) distributed shell tests to use registry
[ https://issues.apache.org/jira/browse/YARN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-2646: - Attachment: YARN-2646-003.patch Patch in sync with trunk. This needs YARN-2657 to compile and YARN-2571 for the tested behaviour (that is , this is the test for YARN-2571) distributed shell tests to use registry - Key: YARN-2646 URL: https://issues.apache.org/jira/browse/YARN-2646 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2646-001.patch, YARN-2646-003.patch for testing and for an example, the Distributed Shell should create a record for itself in the service registry ... the tests can look for this. This will act as a test for the RM integration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
Remus Rusanu created YARN-2739: -- Summary: winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2739: --- Attachment: YARN-2739.000.patch winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Remus Rusanu updated YARN-2739: --- Target Version/s: 2.6.0 Affects Version/s: 2.6.0 winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2678) Recommended improvements to Yarn Registry
[ https://issues.apache.org/jira/browse/YARN-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182961#comment-14182961 ] Hadoop QA commented on YARN-2678: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676873/YARN-2678-003.patch against trunk revision 0942c99. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1268 javac compiler warnings (more than the trunk's current 1266 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 5 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5542//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5542//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5542//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5542//console This message is automatically generated. Recommended improvements to Yarn Registry - Key: YARN-2678 URL: https://issues.apache.org/jira/browse/YARN-2678 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.6.0 Reporter: Gour Saha Assignee: Steve Loughran Attachments: HADOOP-2678-002.patch, YARN-2678-001.patch, YARN-2678-003.patch, yarnregistry.pdf In the process of binding to Slider AM from Slider agent python code here are some of the items I stumbled upon and would recommend as improvements. This is how the Slider's registry looks today - {noformat} jsonservicerec{ description : Slider Application Master, external : [ { api : org.apache.slider.appmaster, addressType : host/port, protocolType : hadoop/protobuf, addresses : [ [ c6408.ambari.apache.org, 34837 ] ] }, { api : org.apache.http.UI, addressType : uri, protocolType : webui, addresses : [ [ http://c6408.ambari.apache.org:43314; ] ] }, { api : org.apache.slider.management, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/mgmt; ] ] }, { api : org.apache.slider.publisher, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher; ] ] }, { api : org.apache.slider.registry, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/registry; ] ] }, { api : org.apache.slider.publisher.configurations, addressType : uri, protocolType : REST, addresses : [ [ http://c6408.ambari.apache.org:43314/ws/v1/slider/publisher/slider; ] ] } ], internal : [ { api : org.apache.slider.agents.secure, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:46958/ws/v1/slider/agents; ] ] }, { api : org.apache.slider.agents.oneway, addressType : uri, protocolType : REST, addresses : [ [ https://c6408.ambari.apache.org:57513/ws/v1/slider/agents; ] ] } ], yarn:persistence : application, yarn:id : application_1412974695267_0015 } {noformat} Recommendations: 1. I would suggest to either remove the string {color:red}jsonservicerec{color} or if it is desirable to have a non-null data at all times then loop the string into the json structure as a top-level attribute to ensure that the registry data is always a valid json document. 2. The {color:red}addresses{color} attribute is currently a list of list. I would recommend to convert it to a list of dictionary objects. In the dictionary object it would be nice to have the host and port portions of objects of addressType uri as separate key-value pairs to avoid parsing on the client side. The URI should also be retained as a key say uri to avoid clients trying to generate it by concatenating host, port, resource-path, etc. Here is a
[jira] [Commented] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182977#comment-14182977 ] Vinod Kumar Vavilapalli commented on YARN-2739: --- The impact of this is that Windows set up is broken for non-secure mode? winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183014#comment-14183014 ] Hadoop QA commented on YARN-2739: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676931/YARN-2739.000.patch against trunk revision 0942c99. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5543//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5543//console This message is automatically generated. winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2571) RM to support YARN registry
[ https://issues.apache.org/jira/browse/YARN-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183020#comment-14183020 ] Hadoop QA commented on YARN-2571: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676886/YARN-2571-005.patch against trunk revision 0942c99. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5541//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5541//console This message is automatically generated. RM to support YARN registry Key: YARN-2571 URL: https://issues.apache.org/jira/browse/YARN-2571 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Steve Loughran Assignee: Steve Loughran Attachments: YARN-2571-001.patch, YARN-2571-002.patch, YARN-2571-003.patch, YARN-2571-005.patch The RM needs to (optionally) integrate with the YARN registry: # startup: create the /services and /users paths with system ACLs (yarn, hdfs principals) # app-launch: create the user directory /users/$username with the relevant permissions (CRD) for them to create subnodes. # attempt, container, app completion: remove service records with the matching persistence and ID -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183026#comment-14183026 ] Xuan Gong commented on YARN-2703: - Testcase failure is not related Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183040#comment-14183040 ] Remus Rusanu commented on YARN-2739: Yes, non-secure Windows clusters are broken as the code will unnecessarily check for the wsce-site.xml file and throw an error. winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183096#comment-14183096 ] Craig Welch commented on YARN-2739: --- Ok, on windows unsecure the tez orderedwordcount fails without the fix and succeeds with the fix, so I believe this fix is good to go. Remus, I don't have a quick way to test this on windows secure, I assume you have tested it there? In any case, given the breadth of the impact, I think the fix is good to go. winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2704: -- Attachment: YARN-2704.2.patch Uploaded a new patch, addressed all other comments Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-2704.1.patch, YARN-2704.2.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2495) Allow admin specify labels in each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183135#comment-14183135 ] Wangda Tan commented on YARN-2495: -- Hi Naga, Thanks for working on this patch, Comments round #1, 1) YarnConfiguration: I think we should add a DEFAULT_DECENTRALIZED_NODELABEL_CONFIGURATION_ENABLED = false to avoid hardcode the false in implementations 2) NodeHeartbeatRequestPBImpl: I just found current PB cannot tell difference between null and empty for repeated fields. And in your implementation, empty set will be always returned no matter the field is not being set or set an empty set. So what we defined null for not changed, empty for no label not establish any more. What we can do is, # Add a new field in NodeHeartbeatRequest, like boolean nodeLabelUpdated. # Use the add/removeLabelsOnNodes API provided by RMNodeLabelsManager, everytime pass the changed labels only. # Everytime set the up-to-date labels to NodeHeartbeatRequest when do heartbeat. #1 and #2 will all need add more fields in NodeHeartbeatRequest. I suggest to do in #3, it's more simple and we can improve it in further JIRA. 3) NodeManager: {code} +if (conf.getBoolean( +YarnConfiguration.ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION, false)) { {code} Instead of hardcode false here, we should use DEFAULT_DECENTRALIZED_NODELABEL_CONFIGURATION_ENABLED instead. bq. + addService((Service) provider); Why do this type conversion? I think we don't need it. bq. createNodeStatusUpdater I suggest to create a overload method without the nodeLabelsProviderService to avoid lots of changes in test/mock classes. 4) NodeLabelsProviderService: It should extends AbstractService, there're some default implementations in AbstractService, we don't need implement all of them. 5) NodeStatusUpdaterImpl: {{isDecentralizedNodeLabelsConf}} may not need here, if nodeLablesProvider passed in is null. That means {{isDecentralizedNodeLabelsConf}} is false. {code} +nodeLabelsForHeartBeat = null; +if (isDecentralizedNodeLabelsConf) { ... {code} According to my comment 2), I suggest to make it simple -- if provider is not null, set NodeHeartbeatRequest.nodeLabels to labels get from provider. {code} +if (nodeLabelsForHeartBeat != null + response.getDiagnosticsMessage() != null) { + LOG.info(Node Labels were rejected from RM + + response.getDiagnosticsMessage()); +} {code} We cannot assume when diagosticMessage is not null, it is the node label rejected. I sugguest to add rejected-node-labels field to RegisterNMResponse and NodeHeartbeatResponse. Existing behavior in RMNodeLabelsManager is, if any of the labels is not valid, all labels will be rejected. What you should do is, # In RM ResourceTracker, if exception raise when replace labels on node, put the new labels to reject node labels to response. # In NM NodeStatusUpdater, if reject node labels is not null, LOG.error rejected node labels, and print diagnostic message. As 3) suggested, create an overload constructor to avoid lots of changes in tests. 6) yarn_server_common_service_protos.proto I think you miss adding nodeLabels to {{RegisterNodeManagerResponseProto}}, which should be in {{RegisterNodeManagerRequestProto}} ? :) 7) ConfigurationNodeLabelsProvider: {code} +String[] nodeLabelsFromScript = + StringUtils.getStrings(conf.get(YarnConfiguration.NM_NODE_LABELS_PREFIX, )); {code} # nodeLabelsFromScript - nodeLabelsFromConfiguration # YarnConfiguration.NM_NODE_LABELS_PREFIX - add an option like YarnConfiguration.NM_NODE_LABELS_FROM_CONFIG (NM_NODE_LABELS_PREFIX + from-config) or some name you prefer -- At least it shouldn't be a prefix. 8) TestEventFlow: Just pass a null for nodeLabelsProvider not works? 9) ResourceTrackerService: {code} +isDecentralizedNodeLabelsConf = conf.getBoolean( +YarnConfiguration.ENABLE_DECENTRALIZED_NODELABEL_CONFIGURATION, false); {code} Avoid hardcode config default here as suggested above. It no need to send shutdown message when any of the labels not accepted by RMNodeLabelsManager. Just add them to a reject node labels list, and add diagnostic message should be enough. {code} ++ , assigned nodeId + nodeId + , node labels { ++ nodeLabels.toString()+ } ; {code} You should use StringUtils.join when you want to get a set of labels to String, set.toString() not defined More comments will be added when you addressed above comments and added tests for them. Thanks, Wangda Allow admin specify labels in each NM (Distributed configuration) - Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task
[jira] [Commented] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183160#comment-14183160 ] Remus Rusanu commented on YARN-2739: [~cwelch] just tested on secure cluster, passed fine winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1915) ClientToAMTokenMasterKey should be provided to AM at launch time
[ https://issues.apache.org/jira/browse/YARN-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183163#comment-14183163 ] Jian He commented on YARN-1915: --- committing this ClientToAMTokenMasterKey should be provided to AM at launch time Key: YARN-1915 URL: https://issues.apache.org/jira/browse/YARN-1915 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0 Reporter: Hitesh Shah Assignee: Jason Lowe Priority: Blocker Attachments: YARN-1915.patch, YARN-1915v2.patch, YARN-1915v3.patch Currently, the AM receives the key as part of registration. This introduces a race where a client can connect to the AM when the AM has not received the key. Current Flow: 1) AM needs to start the client listening service in order to get host:port and send it to the RM as part of registration 2) RM gets the port info in register() and transitions the app to RUNNING. Responds back with client secret to AM. 3) User asks RM for client token. Gets it and pings the AM. AM hasn't received client secret from RM and so RPC itself rejects the request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed
[ https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183167#comment-14183167 ] Zhijie Shen commented on YARN-2724: --- No worry. Thanks for review, Mit! I'll go ahead to commit the patch. If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed - Key: YARN-2724 URL: https://issues.apache.org/jira/browse/YARN-2724 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, YARN-2724.4.patch, YARN-2724.5.patch Look into the log output snippet. It looks like there is an issue during aggregation when an unreadable file is encountered. Likely, this results in bad encoding. {noformat} LogType: command-13.json LogLength: 13934 Log Contents: Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json (Permission denied)command-3.json13983Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json (Permission denied) errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 sys=0.01, real=0.05 secs] 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs] 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, real=0.04 secs] {noformat} Specifically, look at the text after the exception text. There should be two more entries for log files but none exist. This is likely due to the fact that command-13.json is expected to be of length 13934 but its is not as the file was never read. I think, it should have been {noformat} LogType: command-13.json LogLength: Length of the exception text Log Contents: Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json (Permission denied)command-3.json13983Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json (Permission denied) {noformat} {noformat} LogType: errors-3.txt LogLength:0 Log Contents: {noformat} {noformat} LogType:gc.log LogLength:??? Log Contents: ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2739) winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject
[ https://issues.apache.org/jira/browse/YARN-2739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183184#comment-14183184 ] Jian He commented on YARN-2739: --- thanks [~rusanu] and [~cwelch], committing this. winutils task: unsecure path should not call AddNodeManagerAndUserACEsToObject -- Key: YARN-2739 URL: https://issues.apache.org/jira/browse/YARN-2739 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Labels: security, windows Attachments: YARN-2739.000.patch winutils task create path is broken after YARN-2198 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183207#comment-14183207 ] Zhijie Shen commented on YARN-2703: --- The latest patch needs to be rebased after YARN-2724. Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2647) Add yarn queue CLI to get queue info including labels of such queue
[ https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183205#comment-14183205 ] Wangda Tan commented on YARN-2647: -- Hi [~sunilg], Thanks for the patch, some comments, 1) I think we should create a QueueCLI instead of putting it to ApplicationCLI -- it makes more sense to put attempts/containers into ApplicationCLI, they are all belongs to Application, but queue is not. I expect there will be some duplicated code between ApplicationCLI and QueueCLI, but shouldn't too much. 2) I think we don't need -listall, took a look at other commands, like application/container, -list is list all, and -status will follow application-id or container-id. I think we can follow the same way: queue -list will list all queues, and queue -status name or path will print the specified queue only. 3) After rethink about your comment at: https://issues.apache.org/jira/browse/YARN-2647?focusedCommentId=14181701page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14181701 I actually think such filter like -show-acls or -show-nodelabels are make more sense when we need list all queues. For only one queue, print complete status should be acceptable. I would suggest to remove the two options for now until someday people require for them, two reasons: # The name -show-acls itself is ambigious, because we will show complete message by default, why only show acl after add the -show-acls? It should be -show-acls-only. But if so, how to deal with when user put -show-acls-only and show-nodelabels-only together? # I expect there're more filter needed, such as give me all information for queues under a parent queue or give me information for leaf queue only. We can decide how to put filters better when such real requirements come. What you think? Wangda Add yarn queue CLI to get queue info including labels of such queue --- Key: YARN-2647 URL: https://issues.apache.org/jira/browse/YARN-2647 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Wangda Tan Assignee: Sunil G Attachments: 0001-YARN-2647.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2724) If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed
[ https://issues.apache.org/jira/browse/YARN-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183212#comment-14183212 ] Hudson commented on YARN-2724: -- FAILURE: Integrated in Hadoop-trunk-Commit #6334 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6334/]) YARN-2724. Skipped uploading a local log file to HDFS if exception is raised when opening it. Contributed by Xuan Gong. (zjshen: rev e31f0a6558b106662c83e1f797216e412b6689a9) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt If an unreadable file is encountered during log aggregation then aggregated file in HDFS badly formed - Key: YARN-2724 URL: https://issues.apache.org/jira/browse/YARN-2724 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2724.1.patch, YARN-2724.2.patch, YARN-2724.3.patch, YARN-2724.4.patch, YARN-2724.5.patch Look into the log output snippet. It looks like there is an issue during aggregation when an unreadable file is encountered. Likely, this results in bad encoding. {noformat} LogType: command-13.json LogLength: 13934 Log Contents: Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json (Permission denied)command-3.json13983Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json (Permission denied) errors-3.txt0gc.log-20141021044514484052014-10-21T04:45:12.046+: 5.134: [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K-15575K(184320K), 0.0488700 secs] 163840K-15575K(1028096K), 0.0492510 secs] [Times: user=0.06 sys=0.01, real=0.05 secs] 2014-10-21T04:45:14.939+: 8.027: [GC2014-10-21T04:45:14.939+: 8.027: [ParNew: 179415K-11865K(184320K), 0.0941310 secs] 179415K-17228K(1028096K), 0.0943140 secs] [Times: user=0.13 sys=0.04, real=0.09 secs] 2014-10-21T04:46:42.099+: 95.187: [GC2014-10-21T04:46:42.099+: 95.187: [ParNew: 175705K-12802K(184320K), 0.0466420 secs] 181068K-18164K(1028096K), 0.0468490 secs] [Times: user=0.06 sys=0.00, real=0.04 secs] {noformat} Specifically, look at the text after the exception text. There should be two more entries for log files but none exist. This is likely due to the fact that command-13.json is expected to be of length 13934 but its is not as the file was never read. I think, it should have been {noformat} LogType: command-13.json LogLength: Length of the exception text Log Contents: Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json/grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-13.json (Permission denied)command-3.json13983Error aggregating log file. Log file : /grid/0/yarn/log/application_1413865041660_0002/container_1413865041660_0002_01_04/command-3.json/grid/0/yarn/log/application_1413865041660_0002/contaierrors-13.txt0660_0002_01_04/command-3.json (Permission denied) {noformat} {noformat} LogType: errors-3.txt LogLength:0 Log Contents: {noformat} {noformat} LogType:gc.log LogLength:??? Log Contents: ..-20141021044514484052014-10-21T04:45:12.046+: 5.134: [GC2014-10-21T04:45:12.046+: 5.134: [ParNew: 163840K- ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2703: Attachment: YARN-2703.4.patch rebase the patch Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, YARN-2703.4.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2010) If RM fails to recover an app, it can never transition to active again
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2010: --- Priority: Blocker (was: Critical) Marking it a blocker as the RM crashes when trying to recover a regular job with an expired token. If RM fails to recover an app, it can never transition to active again -- Key: YARN-2010 URL: https://issues.apache.org/jira/browse/YARN-2010 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: bc Wong Assignee: Karthik Kambatla Priority: Blocker Attachments: YARN-2010.1.patch, YARN-2010.patch, issue-stacktrace.rtf, yarn-2010-2.patch, yarn-2010-3.patch, yarn-2010-3.patch, yarn-2010-4.patch, yarn-2010-5.patch, yarn-2010-6.patch Sometimes, the RM fails to recover an application. It could be because of turning security on, token expiry, or issues connecting to HDFS etc. The causes could be classified into (1) transient, (2) specific to one application, and (3) permanent and apply to multiple (all) applications. Today, the RM fails to transition to Active and ends up in STOPPED state and can never be transitioned to Active again. The initial stacktrace reported is at https://issues.apache.org/jira/secure/attachment/12676476/issue-stacktrace.rtf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183262#comment-14183262 ] Hadoop QA commented on YARN-2703: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676964/YARN-2703.4.patch against trunk revision e2be333. {color:red}-1 patch{color}. Trunk compilation may be broken. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5546//console This message is automatically generated. Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, YARN-2703.4.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2704) Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time
[ https://issues.apache.org/jira/browse/YARN-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183326#comment-14183326 ] Hadoop QA commented on YARN-2704: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676949/YARN-2704.2.patch against trunk revision b3d8a64. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestLocalCacheDirectoryManager org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService org.apache.hadoop.yarn.server.nodemanager.TestEventFlow {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5544//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5544//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5544//console This message is automatically generated. Localization and log-aggregation will fail if hdfs delegation token expired after token-max-life-time -- Key: YARN-2704 URL: https://issues.apache.org/jira/browse/YARN-2704 Project: Hadoop YARN Issue Type: Sub-task Reporter: Jian He Assignee: Jian He Attachments: YARN-2704.1.patch, YARN-2704.2.patch In secure mode, YARN requires the hdfs-delegation token to do localization and log aggregation on behalf of the user. But the hdfs delegation token will eventually expire after max-token-life-time. So, localization and log aggregation will fail after the token expires. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2183: --- Attachment: on-demand-cleaner.patch Here is a patch that applies on the v6, that relies on CleanerTask#run to ensure multiple cleaner-tasks don't run concurrently. [~sjlee0], can you check if this makes any sense? Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, YARN-2183-trunk-v6.patch, on-demand-cleaner.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183335#comment-14183335 ] Subru Krishnan commented on YARN-2690: -- Thanks [~adhoot] for addressing my feedback. I am fine with leaving the getter method. Make ReservationSystem and its dependent classes independent of Scheduler type Key: YARN-2690 URL: https://issues.apache.org/jira/browse/YARN-2690 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2690.001.patch, YARN-2690.002.patch, YARN-2690.002.patch, YARN-2690.003.patch A lot of common reservation classes depend on CapacityScheduler and specifically its configuration. This jira is to make them ready for other Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2729) Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup
[ https://issues.apache.org/jira/browse/YARN-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183348#comment-14183348 ] Wangda Tan commented on YARN-2729: -- Hi [~Naganarasimha], After looked at the patch, and implementation of NodeHealthMonitorExecutor, I think it may not proper to put a common class for them to yarn.utils. Reasons are: # We already have a ShellCommandExecutor, that is enough to run a general script, and set timeout, fetch output, etc. # The purpose of NodeHealthScriptRunner and NodeLabelScriptRunner (The name in my mind :)) are different: NodeHealthScriptRunner needs do a lot of checks for exception because it need to tell RM what actual happens, but NodeLabelScriptRunner doesn't need so, just a list of successfully fetched labels or no, if failed to run script, log errors to log should be enough. So I suggest do to this like NodeHealthScriptRunner, and remove what we don't need, it should be able to, # If the script is failed or timeout or exception happened, and log them properly # Upon successfully execute script, parse output for labels. I think we can define, lines start with NODE_LABELS: are node labels, and splited by ,. Like: NODE_LABELS:java, windows\n ...(other messages) NODE_LABELS:gpu,x86 The result should be {java, windows, gpu, x86}. And you need make checkAndThrowLabelName in CommonsNodeLabelsManager public, check the labels if valid before send to RM. Does this make sense to you? And in addition, NM_SCRIPT_LABELS_PROVIDER_PREFIX - NM_NODE_LABELS_SCRIPT_PROVIDER_PREFIX is more clear to me, similar to others Thanks, Wangda Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup --- Key: YARN-2729 URL: https://issues.apache.org/jira/browse/YARN-2729 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: YARN-2729.20141023-1.patch, YARN-2729.20141024-1.patch Support script based NodeLabelsProvider Interface in Distributed Node Label Configuration Setup . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183353#comment-14183353 ] Subru Krishnan commented on YARN-2738: -- Thanks [~adhoot] for working on this. I went through the patch and structurally it looks sound. Since I am not very familiar with FS, would suggest [~kasha] to take a look at it. You need to plugin _FairReservationSystem_ in _AbstractReservationSystem::getDefaultReservationSystem_. Other than just a minor comment - there seem to be a couple of spurious diffs in _AllocationFileLoaderService::loadQueue_ due to whitespaces. Add FairReservationSystem for FairScheduler --- Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2738.001.patch Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183368#comment-14183368 ] Hadoop QA commented on YARN-2703: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676964/YARN-2703.4.patch against trunk revision 86ac0d4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.client.TestResourceTrackerOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5547//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5547//console This message is automatically generated. Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, YARN-2703.4.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183379#comment-14183379 ] Xuan Gong commented on YARN-2703: - This testcase failure is tracked by https://issues.apache.org/jira/browse/YARN-2398 Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, YARN-2703.4.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183416#comment-14183416 ] Wangda Tan commented on YARN-2505: -- Hi [~cwelch], Thanks for the patch, generally looks good to me, some minor comments: 1) ApplicationSubmissionContextInfo: According to names in ApplicationSubmissionContxt, appLabelExpression - appNodeLabelExpression amContainerLabelExpression - amContainerNodeLabelExpression And also names of getters/setters 2) RMWebServices, 2.1 It's better to save the reference to RMNodeLabelsManager instead of get it in RMContext everytime. 2.2 Get method like getClusterNodeLabels don't need throw AuthorizationException 2.3 I would suggest to add user name and method being used when callUGI is null or checkAccess returns false. Like User=john not authorized for action=removeFromClusterNodeLabels for easier debugging purpose. At first I thought it is wrong to put colon(:) to URL, but I found it should be safe to do that, according to http://stackoverflow.com/questions/2053132/is-a-colon-safe-for-friendly-url-use. So that shouldn't be a problem. Thanks, Wangda Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file
[ https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2734: Attachment: YARN-2734.1.patch If a sub-folder is encountered by log aggregator it results in invalid aggregated file -- Key: YARN-2734 URL: https://issues.apache.org/jira/browse/YARN-2734 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2734.1.patch See YARN-2724 for some more context on how the error surfaces during yarn logs call. If aggregator sees a sub-folder today it results in the following error when reading the logs: {noformat} Container: container_1413512973198_0019_01_02 on c6401.ambari.apache.org_45454 LogType: cmd_data LogLength: 4096 Log Contents: Error aggregating log file. Log file : /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data (Is a directory) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled
Wangda Tan created YARN-2740: Summary: RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183430#comment-14183430 ] Wangda Tan commented on YARN-2740: -- This issue depends on configuration option defined at YARN-2495. RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled - Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) RM NodeLabelsManager should prevent admin change labels on nodes when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2740: - Summary: RM NodeLabelsManager should prevent admin change labels on nodes when distributed node label configuration enabled (was: RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled) RM NodeLabelsManager should prevent admin change labels on nodes when distributed node label configuration enabled -- Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2740: - Summary: RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled (was: RM NodeLabelsManager should prevent admin change labels on nodes when distributed node label configuration enabled) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled - Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2713) Broken RM Home link in NM Web UI when RM HA is enabled
[ https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183470#comment-14183470 ] Xuan Gong commented on YARN-2713: - +1 LGTM Broken RM Home link in NM Web UI when RM HA is enabled Key: YARN-2713 URL: https://issues.apache.org/jira/browse/YARN-2713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2713-1.patch When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It points to the NM-host:RM-port instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2734) If a sub-folder is encountered by log aggregator it results in invalid aggregated file
[ https://issues.apache.org/jira/browse/YARN-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183484#comment-14183484 ] Hadoop QA commented on YARN-2734: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12676996/YARN-2734.1.patch against trunk revision a52eb4b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5548//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5548//console This message is automatically generated. If a sub-folder is encountered by log aggregator it results in invalid aggregated file -- Key: YARN-2734 URL: https://issues.apache.org/jira/browse/YARN-2734 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2734.1.patch See YARN-2724 for some more context on how the error surfaces during yarn logs call. If aggregator sees a sub-folder today it results in the following error when reading the logs: {noformat} Container: container_1413512973198_0019_01_02 on c6401.ambari.apache.org_45454 LogType: cmd_data LogLength: 4096 Log Contents: Error aggregating log file. Log file : /hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data/hadoop/yarn/log/application_1413512973198_0019/container_1413512973198_0019_01_02/cmd_data (Is a directory) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183486#comment-14183486 ] Sangjin Lee commented on YARN-2183: --- Yes, I would arrive at essentially the same result if we removed this race condition check. This would ensure that only one cleaner task runs at a given time (i.e. serial execution). What it does not prevent, however, is a *back-to-back* execution if an on-demand cleaner run was submitted close to a scheduled cleaner run. The key here is that we're using a *single-threaded* scheduled executor. Here is the sequence with your patch: # a scheduled run gets under way and sets cleanerTaskIsRunning to true # an on-demand run gets submitted: but it will not run until the scheduled run finished because it runs on a single-threaded scheduled executor # the scheduled run finishes and flips cleanerTaskIsRunning to false # the on-demand run gets under way, and finds cleanerTaskIsRunning is false so it runs again (i.e. a back-to-back run) With v.6: # a scheduled run gets under way and sets cleanerTaskIsRunning to true # an on-demand run is requested but does *not* get scheduled because we're checking if a run is already in progress before submitting it (i.e. no back-to-back run) # the scheduled run finishes and flips cleanerTaskIsRunning to false Another way of accomplishing this is to put the on-demand cleaner task on a different executor or a different thread within the same scheduled executor (e.g. giving the scheduled executor 2 threads instead of 1). Then we could do this with the simpler code as the check can be made concurrently and it will do the right thing. What do you think? Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, YARN-2183-trunk-v6.patch, on-demand-cleaner.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183489#comment-14183489 ] Zhijie Shen commented on YARN-2703: --- Will commit the patch. Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, YARN-2703.4.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
[ https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183505#comment-14183505 ] Wing Yew Poon commented on YARN-2722: - Instead of having a hardcoded {{String[] enabledProtocols}}, it is possible to read the enabled protocols from configuration instead, and supply a safe default value in the configuration file? Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle - Key: YARN-2722 URL: https://issues.apache.org/jira/browse/YARN-2722 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2722-1.patch, YARN-2722-2.patch We should disable SSLv3 in HttpFS to protect against the POODLEbleed vulnerability. See [CVE-2014-3566 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566] We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when I checked, I could still connect with SSLv3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2703) Add logUploadedTime into LogValue for better display
[ https://issues.apache.org/jira/browse/YARN-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183534#comment-14183534 ] Hudson commented on YARN-2703: -- FAILURE: Integrated in Hadoop-trunk-Commit #6339 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6339/]) YARN-2703. Added logUploadedTime into LogValue for better display. Contributed by Xuan Gong. (zjshen: rev f81dc3f995579c1b94b11d60e9fc6da56c8a9496) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestLogsCLI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/log/AggregatedLogsBlock.java Add logUploadedTime into LogValue for better display Key: YARN-2703 URL: https://issues.apache.org/jira/browse/YARN-2703 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2703.1.patch, YARN-2703.2.patch, YARN-2703.3.patch, YARN-2703.4.patch Right now, the container can upload its logs multiple times. Sometimes, containers write different logs into the same log file. After the log aggregation, when we query those logs, it will show: LogType: stderr LogContext: LogType: stdout LogContext: LogType: stderr LogContext: LogType: stdout LogContext: The same files could be displayed multiple times. But we can not figure out which logs come first. We could add extra loguploadedTime to let users have better understanding on the logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183571#comment-14183571 ] Karthik Kambatla commented on YARN-2183: Ah, now I get it. I missed the executor serializing the tasks already. I think it is okay to use 2 threads in the executor. Otherwise, we are having to differentiate between scheduled and impromptu cleaner tasks, which is probably an overkill to achieve this. We should probably add a comment to explain the choice of 2 threads, which is to reduce the chances of running cleaner-tasks back-to-back. Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, YARN-2183-trunk-v6.patch, on-demand-cleaner.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183574#comment-14183574 ] Sangjin Lee commented on YARN-2183: --- OK, let me update the patch. Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, YARN-2183-trunk-v6.patch, on-demand-cleaner.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183598#comment-14183598 ] Anubhav Dhoot commented on YARN-2712: - Minor comments below. LGTM otherwise These lines can be removed as rm start should take care of it {noformat} FairScheduler scheduler = (FairScheduler) rm.getResourceScheduler(); scheduler.init(conf); scheduler.start(); scheduler.reinitialize(conf, rm.getRMContext()); {noformat} Is this relevant to the test case ? {noformat} if (schedulerClass.equals(FairScheduler.class)) { Assert.assertEquals( Resource.newInstance(8192, 8), ((FairScheduler)rm1.getResourceScheduler()) .getQueueManager().getRootQueue().getFairShare()); {noformat} Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2712.1.patch TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2713) RM Home link in NM should point to one of the RMs in an HA setup
[ https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2713: --- Summary: RM Home link in NM should point to one of the RMs in an HA setup (was: Broken RM Home link in NM Web UI when RM HA is enabled) RM Home link in NM should point to one of the RMs in an HA setup -- Key: YARN-2713 URL: https://issues.apache.org/jira/browse/YARN-2713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2713-1.patch When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It points to the NM-host:RM-port instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
[ https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2722: -- Attachment: YARN-2722-3.patch Thanks, [~wypoon]. Updated a patch to make enabledProtocols as a config. Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle - Key: YARN-2722 URL: https://issues.apache.org/jira/browse/YARN-2722 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2722-1.patch, YARN-2722-2.patch, YARN-2722-3.patch We should disable SSLv3 in HttpFS to protect against the POODLEbleed vulnerability. See [CVE-2014-3566 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566] We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when I checked, I could still connect with SSLv3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2740) RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled
[ https://issues.apache.org/jira/browse/YARN-2740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2740: - Attachment: YARN-2740-20141024-1.patch Attached a patch for this, read the distributed-configuration-enabled from conf is not implemented, will add that part after YARN-2495 committed. RM AdminService should prevent admin change labels on nodes when distributed node label configuration enabled - Key: YARN-2740 URL: https://issues.apache.org/jira/browse/YARN-2740 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-2740-20141024-1.patch According to YARN-2495, labels of nodes will be specified when NM do heartbeat. We shouldn't allow admin modify labels on nodes when distributed node label configuration enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2713) RM Home link in NM should point to one of the RMs in an HA setup
[ https://issues.apache.org/jira/browse/YARN-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183613#comment-14183613 ] Hudson commented on YARN-2713: -- FAILURE: Integrated in Hadoop-trunk-Commit #6341 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6341/]) YARN-2713. RM Home link in NM should point to one of the RMs in an HA setup. (kasha) (kasha: rev 683897fd028dcc2185383f73b52d15245a69e0cb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java * hadoop-yarn-project/CHANGES.txt RM Home link in NM should point to one of the RMs in an HA setup -- Key: YARN-2713 URL: https://issues.apache.org/jira/browse/YARN-2713 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: yarn-2713-1.patch When RM HA is enabled, the 'RM Home' link in the NM WebUI is broken. It points to the NM-host:RM-port instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183638#comment-14183638 ] Allen Wittenauer commented on YARN-2701: I'm blowing an error: {code} container-executor.c:458:12: warning: implicit declaration of function 'check_dir' is invalid in C99 [-Wimplicit-function-declaration] [exec] return check_dir(path, sb.st_mode, perm, 1); {code} I'll see if I can track down where this is coming from. Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183656#comment-14183656 ] zhihai xu commented on YARN-2701: - I think the warning is because we should add definition of the function prototype at the front of the file or in a header file: int check_dir(char* npath, mode_t st_mode, mode_t desired, int finalComponent); int create_validate_dir(char* npath, mode_t perm, char* path, int finalComponent); So when these functions are called, the function prototype can be found. Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183663#comment-14183663 ] Allen Wittenauer commented on YARN-2701: Sorry, I copied the wrong message, but yes, I think the one I'm actually blowing is the same cause/effect. We should probably just move the functions up near the top rather than put a prototype. Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183678#comment-14183678 ] Xuan Gong commented on YARN-2701: - so, how about moving the function declarations into container-executor.h file? Looks like most of functions in container-executor.c are using this pattern. Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183681#comment-14183681 ] Allen Wittenauer commented on YARN-2701: OK, a simple test I did moved these functions near the top and the compile errors go away. The new compiler shipped this week as part of the Yosemite launch\-\-Apple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)]\-\-points out some other potential issues in the form of warnings, but we can deal with those separately. Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
[ https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183688#comment-14183688 ] Hadoop QA commented on YARN-2722: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677027/YARN-2722-3.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.ha.TestZKFailoverController {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5549//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5549//console This message is automatically generated. Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle - Key: YARN-2722 URL: https://issues.apache.org/jira/browse/YARN-2722 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2722-1.patch, YARN-2722-2.patch, YARN-2722-3.patch We should disable SSLv3 in HttpFS to protect against the POODLEbleed vulnerability. See [CVE-2014-3566 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566] We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when I checked, I could still connect with SSLv3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2722) Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle
[ https://issues.apache.org/jira/browse/YARN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183691#comment-14183691 ] Wei Yan commented on YARN-2722: --- The testcase failure is not related to this jira. Test passed locally. Disable SSLv3 (POODLEbleed vulnerability) in YARN shuffle - Key: YARN-2722 URL: https://issues.apache.org/jira/browse/YARN-2722 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2722-1.patch, YARN-2722-2.patch, YARN-2722-3.patch We should disable SSLv3 in HttpFS to protect against the POODLEbleed vulnerability. See [CVE-2014-3566 |http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-3566] We have {{context = SSLContext.getInstance(TLS);}} in SSLFactory, but when I checked, I could still connect with SSLv3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2505: -- Attachment: YARN-2505.8.patch Attached is patch implementing all of [~leftnoteasy] 's recommendations except one 2.1 It's better to save the reference to RMNodeLabelsManager instead of get it in RMContext everytime. I actually don't think so - to do that requires adding a bunch of code for member/checking for null/ etc/etc for an operation which will be completely lost in the overall cost of the action (io, etc). More importantly, that then links the lifetime of the nodelabel manager to the lifetime of the resource manager web app, which might be ok at the moment, but it could change sometime down the line and then there would be a confusing bug to figure out and resolve. Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2505: -- Attachment: YARN-2505.9.patch And this time, including new files :-) Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2712: - Attachment: YARN-2712.2.patch Thanks for your review, Karthik. All lines you pointed can be removed. Updated. Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2712.1.patch, YARN-2712.2.patch TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2591: -- Attachment: YARN-2591.2.patch Throw authorization exception instead. And change the test cases accordingly to verify 403 code. AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager
Craig Welch created YARN-2741: - Summary: Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Reporter: Craig Welch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2741: -- Component/s: nodemanager Environment: Windows Affects Version/s: 2.6.0 Assignee: Craig Welch Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183731#comment-14183731 ] Wangda Tan commented on YARN-2505: -- [~cwelch], Thanks for the update. I still prefer locate RMNodeLabelsManager in RMWebService to avoid get it every time even if it has little impact on performance. Because the reference of NodeLabelsManager not supposed to be changed during the lifetime of RM. Similar to ApplicationMasterService or YarnScheduler, beyond this one, all service get from RMContext shouldn't be changed as well. We shouldn't do a defensive coding for a situation not suppose to exist. Wangda Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-2741: -- Attachment: YARN-2741.1.patch It's a known issue that new java.net.URI(String) looses file specifications on Windows. If you instead use File.toURI() this does not happen. This patch switches to that approach and is able to successfully serve up log files with a configuration like this {code} property nameyarn.nodemanager.log-dirs/name valueF:/nmlogs/value /property {code} Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Attachments: YARN-2741.1.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183742#comment-14183742 ] Craig Welch commented on YARN-2505: --- [~leftnoteasy] I appreciate your reviewing this change and I understand your perspective on this particular aspect of it - but I think that caching the node label manager here is premature optimization that adds unnecessary complexity and future risk where there isn't a good reason to do so. You'll notice that other implementations in the service are taking the same approach and retrieving references as needed without caching. I'm planning to leave it as/is in this respect. Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2701: Attachment: YARN-2701.addendum.4.patch move function declarations into container-executor.h Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch, YARN-2701.addendum.4.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183749#comment-14183749 ] Wangda Tan commented on YARN-2505: -- [~cwelch], I think this is not a big deal regarding both performance and code style, so I don't have a very strong opinion here. So generally, +1 for this patch. Thanks, Wangda Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2183: -- Attachment: YARN-2183-trunk-v7.patch v.7 patch posted. I switched to using 2 threads for the executor service, and simplified the race condition handling. Also, switched from AtomicBoolean to Lock as it's conceptually clearer and behavior-wise identical. For the overall diff, see https://github.com/ctrezzo/hadoop/compare/ctrezzo:trunk...sharedcache-3-YARN-2183-cleaner For v.6 - v.7, see https://github.com/ctrezzo/hadoop/commit/0066e2e525395f25aa685cc99838ffe98750c400 Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, YARN-2183-trunk-v6.patch, YARN-2183-trunk-v7.patch, on-demand-cleaner.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2712) Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart
[ https://issues.apache.org/jira/browse/YARN-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183789#comment-14183789 ] Hadoop QA commented on YARN-2712: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677049/YARN-2712.2.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 12 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5551//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl org.apache.hadoop.security.token.delegation.web.TestWebDelegationToken {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5551//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5551//console This message is automatically generated. Adding tests about FSQueue and headroom of FairScheduler to TestWorkPreservingRMRestart --- Key: YARN-2712 URL: https://issues.apache.org/jira/browse/YARN-2712 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2712.1.patch, YARN-2712.2.patch TestWorkPreservingRMRestart#testSchedulerRecovery doesn't have test cases about FairScheduler partially. We should support them. {code} // Until YARN-1959 is resolved if (scheduler.getClass() != FairScheduler.class) { assertEquals(availableResources, schedulerAttempt.getHeadroom()); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2591) AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data
[ https://issues.apache.org/jira/browse/YARN-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183792#comment-14183792 ] Hadoop QA commented on YARN-2591: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677050/YARN-2591.2.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common: org.apache.hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-common-project/hadoop-common {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5552//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5552//console This message is automatically generated. AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data --- Key: YARN-2591 URL: https://issues.apache.org/jira/browse/YARN-2591 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.6.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: YARN-2591.1.patch, YARN-2591.2.patch AHSWebServices should return FORBIDDEN(403) if the request user doesn't have access to the history data. Currently, it is going to return INTERNAL_SERVER_ERROR(500). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183797#comment-14183797 ] Hadoop QA commented on YARN-2505: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677048/YARN-2505.9.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5550//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5550//console This message is automatically generated. Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2183) Cleaner service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183800#comment-14183800 ] Hadoop QA commented on YARN-2183: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677057/YARN-2183-trunk-v7.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1294 javac compiler warnings (more than the trunk's current 1271 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5554//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5554//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5554//console This message is automatically generated. Cleaner service for cache manager - Key: YARN-2183 URL: https://issues.apache.org/jira/browse/YARN-2183 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2183-trunk-v1.patch, YARN-2183-trunk-v2.patch, YARN-2183-trunk-v3.patch, YARN-2183-trunk-v4.patch, YARN-2183-trunk-v5.patch, YARN-2183-trunk-v6.patch, YARN-2183-trunk-v7.patch, on-demand-cleaner.patch Implement the cleaner service for the cache manager along with metrics for the service. This service is responsible for cleaning up old resource references in the manager and removing stale entries from the cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2505) Support get/add/remove/change labels in RM REST API
[ https://issues.apache.org/jira/browse/YARN-2505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183799#comment-14183799 ] Hadoop QA commented on YARN-2505: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677048/YARN-2505.9.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1294 javac compiler warnings (more than the trunk's current 1271 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5553//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5553//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5553//console This message is automatically generated. Support get/add/remove/change labels in RM REST API --- Key: YARN-2505 URL: https://issues.apache.org/jira/browse/YARN-2505 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Craig Welch Attachments: YARN-2505.1.patch, YARN-2505.3.patch, YARN-2505.4.patch, YARN-2505.5.patch, YARN-2505.6.patch, YARN-2505.7.patch, YARN-2505.8.patch, YARN-2505.9.patch, YARN-2505.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183808#comment-14183808 ] Hadoop QA commented on YARN-2701: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677055/YARN-2701.addendum.4.patch against trunk revision 683897f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build///testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build///console This message is automatically generated. Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch, YARN-2701.addendum.4.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2723) rmadmin -replaceLabelsOnNode does not correctly parse port
[ https://issues.apache.org/jira/browse/YARN-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183812#comment-14183812 ] Xuan Gong commented on YARN-2723: - +1 LGTM. will commit rmadmin -replaceLabelsOnNode does not correctly parse port -- Key: YARN-2723 URL: https://issues.apache.org/jira/browse/YARN-2723 Project: Hadoop YARN Issue Type: Sub-task Components: client Reporter: Phil D'Amore Assignee: Naganarasimha G R Attachments: YARN-2723.20141023.1.patch, yarn-2723.20141023.2.patch There is an off-by-one issue in RMAdminCLI.java (line 457): port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:))); should probably be: port = Integer.valueOf(nodeIdStr.substring(nodeIdStr.indexOf(:)+1)); Currently attempting to add a label to a node with a port specified looks like this: [yarn@ip-10-0-0-66 ~]$ yarn rmadmin -replaceLabelsOnNode node.example.com:45454,test-label replaceLabelsOnNode: For input string: :45454 Usage: yarn rmadmin [-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]] It appears to be trying to parse the ':' as part of the integer because the substring index is off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)