[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624370#comment-14624370 ] Hadoop QA commented on YARN-3381: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 22m 4s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 16s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 4s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 6m 42s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | mapreduce tests | 9m 8s | Tests failed in hadoop-mapreduce-client-app. | | {color:red}-1{color} | yarn tests | 6m 53s | Tests failed in hadoop-yarn-client. | | {color:green}+1{color} | yarn tests | 1m 59s | Tests passed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 6m 4s | Tests failed in hadoop-yarn-server-nodemanager. | | {color:red}-1{color} | yarn tests | 51m 22s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 128m 10s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator | | | hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart | | | hadoop.yarn.server.nodemanager.TestDeletionService | | | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates | | | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.TestResourceTrackerService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12744987/YARN-3381-011.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / e04faf8 | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8520/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8520/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8520/console | This message was automatically generated. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch,
[jira] [Updated] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3381: Labels: (was: BB2015-05-TBR) Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1462#comment-1462 ] Rohith Sharma K S commented on YARN-2003: - All the above test failures are related to YARN-3916 Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3045) [Event producers] Implement NM writing container lifecycle events to ATS
[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624472#comment-14624472 ] Naganarasimha G R commented on YARN-3045: - Hi [~djp], Sorry for the delayed response! and some points to discuss for your queries as follows : bq why we hook the track of container start event in ContainerManagerImpl, but for container finished event, we do it inside of ContainerImpl? We should try to keep NMTimelinePublisher get referenced in one place if no necessary for other places. This was intentionally done to avoid resending of timelineevents during recovery. In RM's case also it was happening(which is being handled in YARN-3127) hence to avoid duplicate events have kept it there. If any better ways to avoid, i am open for it . Other comments will take care, some of it are due to missing to revert the code while testing ... [Event producers] Implement NM writing container lifecycle events to ATS Key: YARN-3045 URL: https://issues.apache.org/jira/browse/YARN-3045 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Naganarasimha G R Attachments: YARN-3045-YARN-2928.002.patch, YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, YARN-3045-YARN-2928.005.patch, YARN-3045.20150420-1.patch Per design in YARN-2928, implement NM writing container lifecycle events and container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624506#comment-14624506 ] Hudson commented on YARN-3381: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #255 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/255/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624503#comment-14624503 ] Hudson commented on YARN-3069: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #255 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/255/]) YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad) * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Fix For: 2.8.0 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624505#comment-14624505 ] Hudson commented on YARN-3894: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #255 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/255/]) YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity configuration. (Bibin A Chundatt via wangda) (wangda: rev 5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/CHANGES.txt RM startup should fail for wrong CS xml NodeLabel capacity configuration - Key: YARN-3894 URL: https://issues.apache.org/jira/browse/YARN-3894 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info(Initialized root queue + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624469#comment-14624469 ] Brahma Reddy Battula commented on YARN-3381: [~ajisakaa] thanks a lot for review and commit,and thanks all who discussed the problem. Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3381: Priority: Minor (was: Major) Summary: Fix typo InvalidStateTransitonException (was: A typographical error in InvalidStateTransitonException) Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) A typographical error in InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624380#comment-14624380 ] Akira AJISAKA commented on YARN-3381: - +1, the test failures look unrelated to the patch. Thanks Brahma. A typographical error in InvalidStateTransitonException - Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624394#comment-14624394 ] Hudson commented on YARN-3381: -- FAILURE: Integrated in Hadoop-trunk-Commit #8156 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8156/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2003) Support for Application priority : Changes in RM and Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624465#comment-14624465 ] Rohith Sharma K S commented on YARN-2003: - Oho, did not see Sunil's comment earlier!! Support for Application priority : Changes in RM and Capacity Scheduler --- Key: YARN-2003 URL: https://issues.apache.org/jira/browse/YARN-2003 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Sunil G Assignee: Sunil G Labels: BB2015-05-TBR Attachments: 0001-YARN-2003.patch, 00010-YARN-2003.patch, 0002-YARN-2003.patch, 0003-YARN-2003.patch, 0004-YARN-2003.patch, 0005-YARN-2003.patch, 0006-YARN-2003.patch, 0007-YARN-2003.patch, 0008-YARN-2003.patch, 0009-YARN-2003.patch, 0011-YARN-2003.patch, 0012-YARN-2003.patch, 0013-YARN-2003.patch, 0014-YARN-2003.patch, 0015-YARN-2003.patch, 0016-YARN-2003.patch, 0017-YARN-2003.patch, 0018-YARN-2003.patch, 0019-YARN-2003.patch, 0020-YARN-2003.patch AppAttemptAddedSchedulerEvent should be able to receive the Job Priority from Submission Context and store. Later this can be used by Scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624477#comment-14624477 ] Naganarasimha G R commented on YARN-2934: - Hi [~jira.shegalov], Sorry for the long gap in handling this, but had a query related to this, tail can only work in case of linux systems so was wondering how to keep it as a neutral implementation ? May be RandomAccessFile, thoughts ? Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3911) Add tail of stderr to diagnostics if container fails to launch or it container logs are empty
[ https://issues.apache.org/jira/browse/YARN-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624476#comment-14624476 ] Naganarasimha G R commented on YARN-3911: - Hi [~bikassaha], YARN-2688 and YARN-2934, are with similar intentions, if required can finish YARN-2934... but tail can only work in case of linux systems so was wondering how to keep it as a neutral implementations ? May be RandomAccessFile, thoughts ? Add tail of stderr to diagnostics if container fails to launch or it container logs are empty - Key: YARN-3911 URL: https://issues.apache.org/jira/browse/YARN-3911 Project: Hadoop YARN Issue Type: Sub-task Reporter: Bikas Saha The stderr may have useful info in those cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624513#comment-14624513 ] Hudson commented on YARN-3381: -- FAILURE: Integrated in Hadoop-Yarn-trunk #985 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/985/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624512#comment-14624512 ] Hudson commented on YARN-3894: -- FAILURE: Integrated in Hadoop-Yarn-trunk #985 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/985/]) YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity configuration. (Bibin A Chundatt via wangda) (wangda: rev 5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java RM startup should fail for wrong CS xml NodeLabel capacity configuration - Key: YARN-3894 URL: https://issues.apache.org/jira/browse/YARN-3894 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info(Initialized root queue + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624510#comment-14624510 ] Hudson commented on YARN-3069: -- FAILURE: Integrated in Hadoop-Yarn-trunk #985 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/985/]) YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Fix For: 2.8.0 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3635) Get-queue-mapping should be a common interface of YarnScheduler
[ https://issues.apache.org/jira/browse/YARN-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625732#comment-14625732 ] Hadoop QA commented on YARN-3635: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 15s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 49s | The applied patch generated 14 new checkstyle issues (total was 234, now 241). | | {color:red}-1{color} | whitespace | 0m 4s | The patch has 15 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 21s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 51m 7s | Tests passed in hadoop-yarn-server-resourcemanager. | | | | 89m 23s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745158/YARN-3635.6.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a431ed9 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8526/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8526/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8526/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8526/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8526/console | This message was automatically generated. Get-queue-mapping should be a common interface of YarnScheduler --- Key: YARN-3635 URL: https://issues.apache.org/jira/browse/YARN-3635 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Reporter: Wangda Tan Assignee: Wangda Tan Attachments: YARN-3635.1.patch, YARN-3635.2.patch, YARN-3635.3.patch, YARN-3635.4.patch, YARN-3635.5.patch, YARN-3635.6.patch Currently, both of fair/capacity scheduler support queue mapping, which makes scheduler can change queue of an application after submitted to scheduler. One issue of doing this in specific scheduler is: If the queue after mapping has different maximum_allocation/default-node-label-expression of the original queue, {{validateAndCreateResourceRequest}} in RMAppManager checks the wrong queue. I propose to make the queue mapping as a common interface of scheduler, and RMAppManager set the queue after mapping before doing validations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625826#comment-14625826 ] Varun Saxena commented on YARN-3878: [~jianhe], will update a patch soon. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3878: --- Attachment: (was: YARN-3878-addendum.patch) AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x000700b79430 (a
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624703#comment-14624703 ] Hudson commented on YARN-3381: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/243/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624700#comment-14624700 ] Hudson commented on YARN-3069: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/243/]) YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad) * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Fix For: 2.8.0 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624713#comment-14624713 ] MENG DING commented on YARN-1449: - [~jianhe], I think the {{initContainersToIncrease()}} is still needed, because it may need to create the {{this.containersToIncrease}} if it is null. The {{AllocateRequestPBImpl.setAskList()}} uses the same logic. It seemed a little awkward to me too at first though. AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624720#comment-14624720 ] Hudson commented on YARN-3069: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2182 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2182/]) YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad) * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Fix For: 2.8.0 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624722#comment-14624722 ] Hudson commented on YARN-3894: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2182 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2182/]) YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity configuration. (Bibin A Chundatt via wangda) (wangda: rev 5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/CHANGES.txt RM startup should fail for wrong CS xml NodeLabel capacity configuration - Key: YARN-3894 URL: https://issues.apache.org/jira/browse/YARN-3894 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info(Initialized root queue + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN
[jira] [Updated] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-3866: Attachment: YARN-3866.3.patch Thanks all for the review and comments! Updated the patch to: * Mark all new API methods Unstable * Reuse the Container object for decreased/increased containers AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624702#comment-14624702 ] Hudson commented on YARN-3894: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #243 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/243/]) YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity configuration. (Bibin A Chundatt via wangda) (wangda: rev 5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java RM startup should fail for wrong CS xml NodeLabel capacity configuration - Key: YARN-3894 URL: https://issues.apache.org/jira/browse/YARN-3894 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info(Initialized root queue + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624723#comment-14624723 ] Hudson commented on YARN-3381: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2182 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2182/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625866#comment-14625866 ] Varun Saxena commented on YARN-3893: *Reinitialization of Active Services is required*. When you call stop active services, service state for all the services will change to STOPPED. If this RM were to become active again, we will try to start all the active services and services cant transition to START state from STOPPED state. They can only do so when services are in INIT state. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2005) Blacklisting support for scheduling AMs
[ https://issues.apache.org/jira/browse/YARN-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625689#comment-14625689 ] Anubhav Dhoot commented on YARN-2005: - The actual blacklist is already available in the REST API for RM. http://localhost:23188/ws/v1/cluster/apps/application_1436839322176_0001/appattempts. We can add a metric if you still feel its needed. Blacklisting support for scheduling AMs --- Key: YARN-2005 URL: https://issues.apache.org/jira/browse/YARN-2005 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 0.23.10, 2.4.0 Reporter: Jason Lowe Assignee: Anubhav Dhoot Attachments: YARN-2005.001.patch, YARN-2005.002.patch, YARN-2005.003.patch It would be nice if the RM supported blacklisting a node for an AM launch after the same node fails a configurable number of AM attempts. This would be similar to the blacklisting support for scheduling task attempts in the MapReduce AM but for scheduling AM attempts on the RM side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3815) [Aggregation] Application/Flow/User/Queue Level Aggregations
[ https://issues.apache.org/jira/browse/YARN-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625703#comment-14625703 ] Junping Du commented on YARN-3815: -- Hi [~sjlee0], sorry for replying your comments late. Just busy in delivering a quick poc patch for app level aggregation (system metrics only, not include conflict idea part) in YARN-3816. Will back to your questions when figure that out. [Aggregation] Application/Flow/User/Queue Level Aggregations Key: YARN-3815 URL: https://issues.apache.org/jira/browse/YARN-3815 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Priority: Critical Attachments: Timeline Service Nextgen Flow, User, Queue Level Aggregations (v1).pdf, aggregation-design-discussion.pdf, hbase-schema-proposal-for-aggregation.pdf Per previous discussions in some design documents for YARN-2928, the basic scenario is the query for stats can happen on: - Application level, expect return: an application with aggregated stats - Flow level, expect return: aggregated stats for a flow_run, flow_version and flow - User level, expect return: aggregated stats for applications submitted by user - Queue level, expect return: aggregated stats for applications within the Queue Application states is the basic building block for all other level aggregations. We can provide Flow/User/Queue level aggregated statistics info based on application states (a dedicated table for application states is needed which is missing from previous design documents like HBase/Phoenix schema design). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1621) Add CLI to list rows of task attempt ID, container ID, host of container, state of container
[ https://issues.apache.org/jira/browse/YARN-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625207#comment-14625207 ] Bartosz Ługowski commented on YARN-1621: I would appreciate if anyone could review this patch. Thanks. Add CLI to list rows of task attempt ID, container ID, host of container, state of container -- Key: YARN-1621 URL: https://issues.apache.org/jira/browse/YARN-1621 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Tassapol Athiapinya Assignee: Bartosz Ługowski Labels: BB2015-05-TBR Attachments: YARN-1621.1.patch, YARN-1621.2.patch, YARN-1621.3.patch, YARN-1621.4.patch, YARN-1621.5.patch, YARN-1621.6.patch As more applications are moved to YARN, we need generic CLI to list rows of task attempt ID, container ID, host of container, state of container. Today if YARN application running in a container does hang, there is no way to find out more info because a user does not know where each attempt is running in. For each running application, it is useful to differentiate between running/succeeded/failed/killed containers. {code:title=proposed yarn cli} $ yarn application -list-containers -applicationId appId [-containerState state of container] where containerState is optional filter to list container in given state only. container state can be running/succeeded/killed/failed/all. A user can specify more than one container state at once e.g. KILLED,FAILED. task attempt ID container ID host of container state of container {code} CLI should work with running application/completed application. If a container runs many task attempts, all attempts should be shown. That will likely be the case of Tez container-reuse application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3816: - Attachment: YARN-3816-poc-v1.patch Upload a quick POC patch for app level aggregation with aggregating application metrics at YARN system metrics level (not including framework specific counts/metrics). Please note that the patch hasn't been ready for review as lacking of basic polish and end to end testing. Significant changes will happen later, at least include: - separate calculations (SUM, AVG) out of TimelineMetrics as static methods - writing aggregated data should be moved from entity table to a separated application table - new added/modified APIs need to more considerable refactor - more key/completed unit tests should be added [Aggregation] App-level Aggregation for YARN system metrics --- Key: YARN-3816 URL: https://issues.apache.org/jira/browse/YARN-3816 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Junping Du Assignee: Junping Du Attachments: Application Level Aggregation of Timeline Data.pdf, YARN-3816-poc-v1.patch We need application level aggregation of Timeline data: - To present end user aggregated states for each application, include: resource (CPU, Memory) consumption across all containers, number of containers launched/completed/failed, etc. We need this for apps while they are running as well as when they are done. - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be aggregated to show details of states in framework level. - Other level (Flow/User/Queue) aggregation can be more efficient to be based on Application-level aggregations rather than raw entity-level data as much less raws need to scan (with filter out non-aggregated entities, like: events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625137#comment-14625137 ] Wangda Tan commented on YARN-3866: -- Latest patch looks good, [~mding], could you set status of this JIRA to patch available to kick Jenkins? AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3814) REST API implementation for getting raw entities in TimelineReader
[ https://issues.apache.org/jira/browse/YARN-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3814: --- Summary: REST API implementation for getting raw entities in TimelineReader (was: REST API implementation for TimelineReader) REST API implementation for getting raw entities in TimelineReader -- Key: YARN-3814 URL: https://issues.apache.org/jira/browse/YARN-3814 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: YARN-2928 Reporter: Varun Saxena Assignee: Varun Saxena -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3873) pendingApplications in LeafQueue should also use OrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625163#comment-14625163 ] Wangda Tan commented on YARN-3873: -- Hi [~sunilg], I can understand the value of supporting customized comparator for pending applications (for example priority-based activation), but I'm not sure if using same comparator of orderingPolicy is also valid for pendingApplications. For example, fair comparator considers demand resources, this may not make sense when comparing pending applications. It makes more sense to me if we activate application by its submission order instead of demand (size of AM container resource request). How about change the JIRA purpose to be: support customized comparator to activate applications. You can do this by adding a getActivateIterator to OrderingPolicy or creating new interface for it. I also suggest to put this as a sub JIRA of YARN-3306 for better tracking. Thoughts? pendingApplications in LeafQueue should also use OrderingPolicy --- Key: YARN-3873 URL: https://issues.apache.org/jira/browse/YARN-3873 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.7.0 Reporter: Sunil G Assignee: Sunil G Attachments: 0001-YARN-3873.patch, 0002-YARN-3873.patch Currently *pendingApplications* in LeafQueue is using {{applicationComparator}} from CapacityScheduler. This can be changed and pendingApplications can use the OrderingPolicy configured in Queue level (Fifo/Fair as configured). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3049) [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend
[ https://issues.apache.org/jira/browse/YARN-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-3049: -- Attachment: YARN-3049-WIP.1.patch Attache a WIP patch so that the community can take a look while I still need to add the app-flow mapping and some missing fields. [Storage Implementation] Implement storage reader interface to fetch raw data from HBase backend Key: YARN-3049 URL: https://issues.apache.org/jira/browse/YARN-3049 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Zhijie Shen Attachments: YARN-3049-WIP.1.patch Implement existing ATS queries with the new ATS reader design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625206#comment-14625206 ] Hadoop QA commented on YARN-3904: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 21s | Findbugs (version ) appears to be broken on YARN-2928. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 15s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 47s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 7m 58s | Tests failed in hadoop-yarn-server-timelineservice. | | | | 44m 44s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl | | | hadoop.yarn.server.timelineservice.aggregation.timebased.TestPhoenixAggregatorWriter | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745086/YARN-3904-YARN-2928.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | YARN-2928 / 2d4a8f4 | | hadoop-yarn-server-timelineservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8522/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8522/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8522/console | This message was automatically generated. Adopt PhoenixTimelineWriter into time-based aggregation storage --- Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. This JIRA proposes to move the Phoenix storage implementation from o.a.h.yarn.server.timelineservice.storage to o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625212#comment-14625212 ] Subru Krishnan commented on YARN-3116: -- Thanks [~zjshen] for reviewing and committing the patch [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Giovanni Matteo Fumarola Fix For: 2.8.0 Attachments: YARN-3116.patch, YARN-3116.v10.patch, YARN-3116.v2.patch, YARN-3116.v3.patch, YARN-3116.v4.patch, YARN-3116.v5.patch, YARN-3116.v6.patch, YARN-3116.v7.patch, YARN-3116.v8.patch, YARN-3116.v9.patch In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3904) Adopt PhoenixTimelineWriter into time-based aggregation storage
[ https://issues.apache.org/jira/browse/YARN-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-3904: Attachment: YARN-3904-YARN-2928.003.patch Fix findbugs warnings and some code formatting. Adopt PhoenixTimelineWriter into time-based aggregation storage --- Key: YARN-3904 URL: https://issues.apache.org/jira/browse/YARN-3904 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Li Lu Assignee: Li Lu Attachments: YARN-3904-YARN-2928.001.patch, YARN-3904-YARN-2928.002.patch, YARN-3904-YARN-2928.003.patch After we finished the design for time-based aggregation, we can adopt our existing Phoenix storage into the storage of the aggregated data. This JIRA proposes to move the Phoenix storage implementation from o.a.h.yarn.server.timelineservice.storage to o.a.h.yarn.server.timelineservice.aggregation.timebased, and make it a fully devoted writer for time-based aggregation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625088#comment-14625088 ] Varun Vasudev commented on YARN-3644: - One other minor comment - can you please change yarn.nodemanager.shutdown.on.RM.connection.failures to yarn.nodemanager.shutdown-on-rm-connection-failures? Node manager shuts down if unable to connect with RM Key: YARN-3644 URL: https://issues.apache.org/jira/browse/YARN-3644 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Srikanth Sundarrajan Assignee: Raju Bairishetti Attachments: YARN-3644.001.patch, YARN-3644.001.patch, YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch When NM is unable to connect to RM, NM shuts itself down. {code} } catch (ConnectException e) { //catch and throw the exception if tried MAX wait time to connect RM dispatcher.getEventHandler().handle( new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); throw new YarnRuntimeException(e); {code} In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs. Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625019#comment-14625019 ] Arun Suresh commented on YARN-3453: --- Thanks for the reviews [~kasha], [~ashwinshankar77] and [~peng.zhang] Will be committing this shortly.. Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing --- Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch, YARN-3453.5.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625050#comment-14625050 ] Hadoop QA commented on YARN-3844: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 35s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | yarn tests | 6m 2s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 21m 8s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.containermanager.container.TestContainer | | | hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService | | | hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745078/YARN-3844.006.patch | | Optional Tests | javac unit | | git revision | trunk / 19295b3 | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8521/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8521/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8521/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8521/console | This message was automatically generated. Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.006.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624782#comment-14624782 ] Varun Saxena commented on YARN-3877: [~ste...@apache.org] / [~chris.douglas] kindly review YarnClientImpl.submitApplication swallows exceptions Key: YARN-3877 URL: https://issues.apache.org/jira/browse/YARN-3877 Project: Hadoop YARN Issue Type: Improvement Components: client Affects Versions: 2.7.2 Reporter: Steve Loughran Assignee: Varun Saxena Priority: Minor Attachments: YARN-3877.01.patch When {{YarnClientImpl.submitApplication}} spins waiting for the application to be accepted, any interruption during its Sleep() calls are logged and swallowed. this makes it hard to interrupt the thread during shutdown. Really it should throw some form of exception and let the caller deal with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624815#comment-14624815 ] Hudson commented on YARN-3381: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #253 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/253/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624973#comment-14624973 ] Sangjin Lee commented on YARN-3908: --- I would appreciate your review on this. Thanks! Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3453: --- Summary: Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing (was: Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing --- Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch, YARN-3453.5.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2934) Improve handling of container's stderr
[ https://issues.apache.org/jira/browse/YARN-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625060#comment-14625060 ] Gera Shegalov commented on YARN-2934: - Hi [~Naganarasimha], yes I was thinking the same, we should try to do it in the java land. I'd prefer using RawLocalFileSytem#read(buf, off, len) in order not to mix in java.io API. Since the NM webUI can read logs, we should have no problems accessing them from the NM JVM. Improve handling of container's stderr --- Key: YARN-2934 URL: https://issues.apache.org/jira/browse/YARN-2934 Project: Hadoop YARN Issue Type: Improvement Reporter: Gera Shegalov Assignee: Naganarasimha G R Priority: Critical Most YARN applications redirect stderr to some file. That's why when container launch fails with {{ExitCodeException}} the message is empty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison updated YARN-3844: Attachment: (was: YARN-3844.005.patch) Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.006.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3453) Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing
[ https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625000#comment-14625000 ] Karthik Kambatla commented on YARN-3453: +1 Fair Scheduler: Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing --- Key: YARN-3453 URL: https://issues.apache.org/jira/browse/YARN-3453 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.6.0 Reporter: Ashwin Shankar Assignee: Arun Suresh Attachments: YARN-3453.1.patch, YARN-3453.2.patch, YARN-3453.3.patch, YARN-3453.4.patch, YARN-3453.5.patch There are two places in preemption code flow where DefaultResourceCalculator is used, even in DRF mode. Which basically results in more resources getting preempted than needed, and those extra preempted containers aren’t even getting to the “starved” queue since scheduling logic is based on DRF's Calculator. Following are the two places : 1. {code:title=FSLeafQueue.java|borderStyle=solid} private boolean isStarved(Resource share) {code} A queue shouldn’t be marked as “starved” if the dominant resource usage is = fair/minshare. 2. {code:title=FairScheduler.java|borderStyle=solid} protected Resource resToPreempt(FSLeafQueue sched, long curTime) {code} -- One more thing that I believe needs to change in DRF mode is : during a preemption round,if preempting a few containers results in satisfying needs of a resource type, then we should exit that preemption round, since the containers that we just preempted should bring the dominant resource usage to min/fair share. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Burlison updated YARN-3844: Attachment: YARN-3844.006.patch Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.006.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624769#comment-14624769 ] Varun Saxena commented on YARN-3878: [~jianhe] / [~kasha], added an addendum patch. AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624818#comment-14624818 ] MENG DING commented on YARN-1449: - This patch will not build by itself, as it has dependency on YARN-3866 (in particular, {{IncreaseContainersResourceRequestProto}}, {{IncreaseContainersResourceResponseProto}}). It is very difficult to cleanly separate out each patch. Currently I generate a big patch and split it into multiple ones based on files. If YARN-3866 passes initial review, maybe I can combine YARN-3866 and YARN-1449 into one patch and submit that for pre-commit build? AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3878: --- Attachment: (was: YARN-3878-addendum.patch) AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x000700b79430 (a
[jira] [Commented] (YARN-3644) Node manager shuts down if unable to connect with RM
[ https://issues.apache.org/jira/browse/YARN-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624909#comment-14624909 ] Varun Vasudev commented on YARN-3644: - [~raju.bairishetti] the latest patch also conflicts with a recent commit in NodeStatusUpdaterImpl. Can you please check it out? Thanks! Node manager shuts down if unable to connect with RM Key: YARN-3644 URL: https://issues.apache.org/jira/browse/YARN-3644 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Srikanth Sundarrajan Assignee: Raju Bairishetti Attachments: YARN-3644.001.patch, YARN-3644.001.patch, YARN-3644.002.patch, YARN-3644.003.patch, YARN-3644.patch When NM is unable to connect to RM, NM shuts itself down. {code} } catch (ConnectException e) { //catch and throw the exception if tried MAX wait time to connect RM dispatcher.getEventHandler().handle( new NodeManagerEvent(NodeManagerEventType.SHUTDOWN)); throw new YarnRuntimeException(e); {code} In large clusters, if RM is down for maintenance for longer period, all the NMs shuts themselves down, requiring additional work to bring up the NMs. Setting the yarn.resourcemanager.connect.wait-ms to -1 has other side effects, where non connection failures are being retried infinitely by all YarnClients (via RMProxy). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624805#comment-14624805 ] Hudson commented on YARN-3069: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2201/]) YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Fix For: 2.8.0 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3381) Fix typo InvalidStateTransitonException
[ https://issues.apache.org/jira/browse/YARN-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624808#comment-14624808 ] Hudson commented on YARN-3381: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2201/]) YARN-3381. Fix typo InvalidStateTransitonException. Contributed by Brahma Reddy Battula. (aajisaka: rev 19295b36d90e26616accee73b1f7743aab5df692) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/JobImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitionException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachine.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/InvalidStateTransitonException.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/state/StateMachineFactory.java Fix typo InvalidStateTransitonException --- Key: YARN-3381 URL: https://issues.apache.org/jira/browse/YARN-3381 Project: Hadoop YARN Issue Type: Improvement Components: api Affects Versions: 2.6.0 Reporter: Xiaoshuang LU Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: YARN-3381-002.patch, YARN-3381-003.patch, YARN-3381-004-branch-2.patch, YARN-3381-004.patch, YARN-3381-005.patch, YARN-3381-006.patch, YARN-3381-007.patch, YARN-3381-008.patch, YARN-3381-010.patch, YARN-3381-011.patch, YARN-3381.patch Appears that InvalidStateTransitonException should be InvalidStateTransitionException. Transition was misspelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624807#comment-14624807 ] Hudson commented on YARN-3894: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2201 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2201/]) YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity configuration. (Bibin A Chundatt via wangda) (wangda: rev 5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java RM startup should fail for wrong CS xml NodeLabel capacity configuration - Key: YARN-3894 URL: https://issues.apache.org/jira/browse/YARN-3894 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info(Initialized root queue + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN
[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3878: --- Attachment: YARN-3878-addendum.patch AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on
[jira] [Updated] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3878: --- Attachment: YARN-3878-addendum.patch AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624834#comment-14624834 ] Varun Saxena commented on YARN-3878: Should I reopen the issue ? AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1680: -- Assignee: Tan, Wangda (was: Chen He) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Sharma K S Assignee: Tan, Wangda Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1645) ContainerManager implementation to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624921#comment-14624921 ] MENG DING commented on YARN-1645: - Thanks for the review [~jianhe] ! bq. This check should not be needed, because AM should be able to resize an existing container no matter RM restarted or not. I have some concerns regarding this that I hope to get some clarifications. According to the work-preserving RM restart documentation (http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html): bq. RM recovers its runing state by taking advantage of the container statuses sent from all NMs. NM will not kill the containers when it re-syncs with the restarted RM. It continues managing the containers and send the container statuses across to RM when it re-registers. RM reconstructs the container instances and the associated applications’ scheduling status by absorbing these containers’ information Consider this scenario: * RM approves a container resource increase request and sends an increase token to AM. * Before AM actually increases the resource on NM, RM crashes and then restarts. Because of the work preserving recovery, RM re-constructs the container resource based on the information sent by NM, and it is still the old resource allocation for the container before the increase. * Now AM does the increase action on NM. If NM doesn't reject this, it will start to enforce the container with increased resource. Now the views of resource allocation between RM and NM are inconsistent. Thoughts? bq. A lot of code is duplicate between authorizeStartRequest and authorizeResourceIncreaseRequest - could you refactor the code to share the same code ? Will do bq. Portion of the code belongs to YARN-1644 and the patch won't compile. This is the same situations with YARN-1449. Everything is intertwined :-( May need to combine everything into a big patch to submit for jenkins build. ContainerManager implementation to support container resizing - Key: YARN-1645 URL: https://issues.apache.org/jira/browse/YARN-1645 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Wangda Tan Assignee: MENG DING Attachments: YARN-1645.1.patch, YARN-1645.2.patch, yarn-1645.1.patch Implementation of ContainerManager for container resize, including: 1) ContainerManager resize logic 2) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3894) RM startup should fail for wrong CS xml NodeLabel capacity configuration
[ https://issues.apache.org/jira/browse/YARN-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624814#comment-14624814 ] Hudson commented on YARN-3894: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #253 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/253/]) YARN-3894. RM startup should fail for wrong CS xml NodeLabel capacity configuration. (Bibin A Chundatt via wangda) (wangda: rev 5ed1fead6b5ec24bb0ce1a3db033c2ee1ede4bb4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java RM startup should fail for wrong CS xml NodeLabel capacity configuration - Key: YARN-3894 URL: https://issues.apache.org/jira/browse/YARN-3894 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Fix For: 2.8.0 Attachments: 0001-YARN-3894.patch, 0002-YARN-3894.patch, capacity-scheduler.xml Currently in capacity Scheduler when capacity configuration is wrong RM will shutdown, but not incase of NodeLabels capacity mismatch In {{CapacityScheduler#initializeQueues}} {code} private void initializeQueues(CapacitySchedulerConfiguration conf) throws IOException { root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); labelManager.reinitializeQueueLabels(getQueueToLabels()); root = parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop); LOG.info(Initialized root queue + root); initializeQueueMappings(); setQueueAcls(authorizer, queues); } {code} {{labelManager}} is initialized from queues and calculation for Label level capacity mismatch happens in {{parseQueue}} . So during initialization {{parseQueue}} the labels will be empty . *Steps to reproduce* # Configure RM with capacity scheduler # Add one or two node label from rmadmin # Configure capacity xml with nodelabel but issue with capacity configuration for already added label # Restart both RM # Check on service init of capacity scheduler node label list is populated *Expected* RM should not start *Current exception on reintialize check* {code} 2015-07-07 19:18:25,655 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Initialized queue: default: capacity=0.5, absoluteCapacity=0.5, usedResources=memory:0, vCores:0, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=0, numContainers=0 2015-07-07 19:18:25,656 WARN org.apache.hadoop.yarn.server.resourcemanager.AdminService: Exception refresh queues. java.io.IOException: Failed to re-init queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:383) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:376) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:605) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:314) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:824) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:420) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) Caused by: java.lang.IllegalArgumentException: Illegal capacity of 0.5 for children of queue root for label=node2 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.setChildQueues(ParentQueue.java:159) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:639) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:503) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:379) ... 8 more 2015-07-07 19:18:25,656 WARN
[jira] [Commented] (YARN-3069) Document missing properties in yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624812#comment-14624812 ] Hudson commented on YARN-3069: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #253 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/253/]) YARN-3069. Document missing properties in yarn-default.xml. Contributed by Ray Chiang. (aajisaka: rev d6675606dc5f141c9af4f76a37128f8de4cfedad) * hadoop-common-project/hadoop-common/src/site/markdown/DeprecatedProperties.md * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/CHANGES.txt Document missing properties in yarn-default.xml --- Key: YARN-3069 URL: https://issues.apache.org/jira/browse/YARN-3069 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Ray Chiang Assignee: Ray Chiang Labels: supportability Fix For: 2.8.0 Attachments: YARN-3069.001.patch, YARN-3069.002.patch, YARN-3069.003.patch, YARN-3069.004.patch, YARN-3069.005.patch, YARN-3069.006.patch, YARN-3069.007.patch, YARN-3069.008.patch, YARN-3069.009.patch, YARN-3069.010.patch, YARN-3069.011.patch, YARN-3069.012.patch, YARN-3069.013.patch The following properties are currently not defined in yarn-default.xml. These properties should either be A) documented in yarn-default.xml OR B) listed as an exception (with comments, e.g. for internal use) in the TestYarnConfigurationFields unit test Any comments for any of the properties below are welcome. org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore security.applicationhistory.protocol.acl yarn.app.container.log.backups yarn.app.container.log.dir yarn.app.container.log.filesize yarn.client.app-submission.poll-interval yarn.client.application-client-protocol.poll-timeout-ms yarn.is.minicluster yarn.log.server.url yarn.minicluster.control-resource-monitoring yarn.minicluster.fixed.ports yarn.minicluster.use-rpc yarn.node-labels.fs-store.retry-policy-spec yarn.node-labels.fs-store.root-dir yarn.node-labels.manager-class yarn.nodemanager.container-executor.os.sched.priority.adjustment yarn.nodemanager.container-monitor.process-tree.class yarn.nodemanager.disk-health-checker.enable yarn.nodemanager.docker-container-executor.image-name yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms yarn.nodemanager.linux-container-executor.group yarn.nodemanager.log.deletion-threads-count yarn.nodemanager.user-home-dir yarn.nodemanager.webapp.https.address yarn.nodemanager.webapp.spnego-keytab-file yarn.nodemanager.webapp.spnego-principal yarn.nodemanager.windows-secure-container-executor.group yarn.resourcemanager.configuration.file-system-based-store yarn.resourcemanager.delegation-token-renewer.thread-count yarn.resourcemanager.delegation.key.update-interval yarn.resourcemanager.delegation.token.max-lifetime yarn.resourcemanager.delegation.token.renew-interval yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size yarn.resourcemanager.metrics.runtime.buckets yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs yarn.resourcemanager.reservation-system.class yarn.resourcemanager.reservation-system.enable yarn.resourcemanager.reservation-system.plan.follower yarn.resourcemanager.reservation-system.planfollower.time-step yarn.resourcemanager.rm.container-allocation.expiry-interval-ms yarn.resourcemanager.webapp.spnego-keytab-file yarn.resourcemanager.webapp.spnego-principal yarn.scheduler.include-port-in-node-name yarn.timeline-service.delegation.key.update-interval yarn.timeline-service.delegation.token.max-lifetime yarn.timeline-service.delegation.token.renew-interval yarn.timeline-service.generic-application-history.enabled yarn.timeline-service.generic-application-history.fs-history-store.compression-type yarn.timeline-service.generic-application-history.fs-history-store.uri yarn.timeline-service.generic-application-history.store-class yarn.timeline-service.http-cross-origin.enabled yarn.tracking.url.generator -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3535) ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED
[ https://issues.apache.org/jira/browse/YARN-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625238#comment-14625238 ] Arun Suresh commented on YARN-3535: --- Thanks for working on this [~peng.zhang]. We seem to be hitting this on our scale clusters as well.. so would be good to get this in soon. In our case the NM re-registration was caused by YARN-3842 The Patch looks good to me. Any idea why the tests failed ? ResourceRequest should be restored back to scheduler when RMContainer is killed at ALLOCATED - Key: YARN-3535 URL: https://issues.apache.org/jira/browse/YARN-3535 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Peng Zhang Assignee: Peng Zhang Priority: Critical Attachments: 0003-YARN-3535.patch, YARN-3535-001.patch, YARN-3535-002.patch, syslog.tgz, yarn-app.log During rolling update of NM, AM start of container on NM failed. And then job hang there. Attach AM logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625339#comment-14625339 ] Colin Patrick McCabe commented on YARN-3844: {code} snprintf(pid_buf, sizeof(pid_buf), %ld, (int64_t)pid); {code} This will generate a warning on 32-bit Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.006.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625302#comment-14625302 ] Wangda Tan commented on YARN-3885: -- Increase to blocker since this is a bad bug. Thanks for working on this [~ajithshetty]. I think patch looks generally good except one thing also mentioned by [~xinxianyin]: bq. for non-leaf queues, min(sum of children's PREEMOTABLEs, extra) -(this is what CHILDRENPREEMPTABLE does in the patch) Specifying {{ret.preemptableExtra = childrensPreemptable;}} seems not enough. Could you add a test to verify: when sum(queue.children.preemptable) extra, we can still get correct preemption result? ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Assignee: Ajith S Priority: Blocker Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625304#comment-14625304 ] Bibin A Chundatt commented on YARN-3893: [~varun_saxena] and [~sunilg] . Only need to call {{rm.transitionToStandby(false)}} on exception . Since it handles transition to standby in rm context,Stop active services and not reinitializing queues. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers
Anubhav Dhoot created YARN-3920: --- Summary: FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers Key: YARN-3920 URL: https://issues.apache.org/jira/browse/YARN-3920 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Reserving a node for a container was designed for preventing large containers from starvation from small requests that keep getting into a node. Today we let this be used even for a small container request. This has a huge impact on scheduling since we block other scheduling requests until that reservation is fulfilled. We should make this configurable so its impact can be minimized by limiting it for large container requests as originally intended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated
[ https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625307#comment-14625307 ] Bibin A Chundatt commented on YARN-3884: Please review patch attached RMContainerImpl transition from RESERVED to KILL apphistory status not updated -- Key: YARN-3884 URL: https://issues.apache.org/jira/browse/YARN-3884 Project: Hadoop YARN Issue Type: Bug Environment: Suse11 Sp3 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, Elapsed Time.jpg, Test Result-Container status.jpg Setup === 1 NM 3072 16 cores each Steps to reproduce === 1.Submit apps to Queue 1 with 512 mb 1 core 2.Submit apps to Queue 2 with 512 mb and 5 core lots of containers get reserved and unreserved in this case {code} 2015-07-02 20:45:31,169 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to RESERVED 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container application=application_1435849994778_0002 resource=memory:512, vCores:5 queue=QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=memory:2560, vCores:21, usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 used=memory:2560, vCores:21 cluster=memory:6144, vCores:32 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, absoluteCapacity=0.4, usedResources=memory:3072, vCores:26, usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, numContainers=6 2015-07-02 20:45:31,170 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.96875 absoluteUsedCapacity=0.96875 used=memory:5632, vCores:31 cluster=memory:6144, vCores:32 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to ALLOCATED 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1435849994778_0001 CONTAINERID=container_e24_1435849994778_0001_01_14 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e24_1435849994778_0001_01_14 of capacity memory:512, vCores:1 on host host-10-19-92-117:64318, which has 6 containers, memory:3072, vCores:14 used and memory:0, vCores:2 available after allocation 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignedContainer application attempt=appattempt_1435849994778_0001_01 container=Container: [ContainerId: container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, NodeHttpAddress: host-10-19-92-117:65321, Resource: memory:512, vCores:1, Priority: 20, Token: null, ] queue=default: capacity=0.2, absoluteCapacity=0.2, usedResources=memory:2560, vCores:5, usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, numContainers=5 clusterResource=memory:6144, vCores:32 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting assigned queue: root.default stats: default: capacity=0.2, absoluteCapacity=0.2, usedResources=memory:3072, vCores:6, usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6 2015-07-02 20:45:31,191 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 used=memory:6144, vCores:32 cluster=memory:6144, vCores:32 2015-07-02 20:45:32,143 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e24_1435849994778_0001_01_14 Container Transitioned from ALLOCATED to ACQUIRED 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Trying to fulfill reservation for application application_1435849994778_0002 on node: host-10-19-92-143:64318 2015-07-02 20:45:32,174 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Reserved container
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625369#comment-14625369 ] Zhijie Shen commented on YARN-3908: --- [~vrushalic] and [~sjlee0], thanks for helping fix the problems. I've two questions: 1. In fact, I'm wondering if we should but info and events into a separate column family like what we did for configs/metrics? 2. We don't want to store the metric type, do we? Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3885) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level
[ https://issues.apache.org/jira/browse/YARN-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3885: - Priority: Blocker (was: Critical) ProportionalCapacityPreemptionPolicy doesn't preempt if queue is more than 2 level -- Key: YARN-3885 URL: https://issues.apache.org/jira/browse/YARN-3885 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.8.0 Reporter: Ajith S Assignee: Ajith S Priority: Blocker Attachments: YARN-3885.02.patch, YARN-3885.03.patch, YARN-3885.04.patch, YARN-3885.patch when preemption policy is {{ProportionalCapacityPreemptionPolicy.cloneQueues}} this piece of code, to calculate {{untoucable}} doesnt consider al the children, it considers only immediate childern -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MENG DING updated YARN-3866: Attachment: YARN-3866-YARN-1197.4.patch YARN-3866.3.patch doesn't build by itself because the {{IncreaseContainersResourceRequest}} and {{IncreaseContainersResourceResponse}} in {{TestPBImplRecords.java}} are defined in YARN-1449. I have removed them from {{TestPBImplRecords.java}} and generated new patch YARN-3866-YARN-1197.4.patch. I will add them back in YARN-1449. AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625331#comment-14625331 ] Wangda Tan commented on YARN-1449: -- [~mding], you can submit them together to trigger pre-commit build, I think YARN-3866 will go first. AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3844) Make hadoop-yarn-project Native code -Wall-clean
[ https://issues.apache.org/jira/browse/YARN-3844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625371#comment-14625371 ] Alan Burlison commented on YARN-3844: - Oops yes, good catch - thanks. I'll change it to use PRId64 like the others, and I'll check to see if there are other instances elsewhere. Make hadoop-yarn-project Native code -Wall-clean Key: YARN-3844 URL: https://issues.apache.org/jira/browse/YARN-3844 Project: Hadoop YARN Issue Type: Sub-task Components: build Affects Versions: 2.7.0 Environment: As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean Reporter: Alan Burlison Assignee: Alan Burlison Attachments: YARN-3844.001.patch, YARN-3844.002.patch, YARN-3844.006.patch As we specify -Wall as a default compilation flag, it would be helpful if the Native code was -Wall-clean -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled
[ https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624773#comment-14624773 ] Varun Saxena commented on YARN-3916: [~jianhe], I have an added and addendum patch on YARN-3878. It adds the previous drained flag, reset it on InterruptedException and kept the bits related to YARN-3878 which were required. DrainDispatcher#await should wait till event has been completely handled Key: YARN-3916 URL: https://issues.apache.org/jira/browse/YARN-3916 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3916.01.patch, YARN-3916.02.patch DrainDispatcher#await should wait till event has been completely handled. Currently it only checks for whether event queue has become empty. And in many tests we directly check for a state to be changed after calling await. Sometimes, the states do not change by the time we check them as event has not been completely handled. *This is causing test failures* such as YARN-3909 and YARN-3910 and may cause other test failures as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624917#comment-14624917 ] Chen He commented on YARN-1680: --- Sorry for the lateness. [~wangda], I have assigned this issue to you. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Sharma K S Assignee: Tan, Wangda Attachments: YARN-1680-WIP.patch, YARN-1680-v2.patch, YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625521#comment-14625521 ] Wangda Tan commented on YARN-3866: -- Rekicked Jenkins to see if the problem still exists. AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625387#comment-14625387 ] Sangjin Lee commented on YARN-3908: --- bq. 2. We don't want to store the metric type, do we? Maybe I was mistaken when I read your comment that said I also realized that the metric type is not persisted too. I took it to mean that you're suggesting that we persist it. I also had an offline chat with [~vrushalic], and she clarified that we probably do not need to persist the metric type and that we can distinguish between a single value metric vs. time series as you described. We still need to make some changes to ensure that the hbase writer sets the right min version when it writes a single-value metric (currently it's not being done). Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
Zack Marsh created YARN-3921: Summary: Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by all map tasks=246524928 Total megabyte-seconds taken by all reduce tasks=41935872 Map-Reduce Framework Map input records=16 Map output records=32 Map output bytes=288 Map output materialized bytes=448 Input split bytes=2310 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=448 Reduce input records=32 Reduce output records=0 Spilled Records=64 Shuffled Maps
[jira] [Comment Edited] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625438#comment-14625438 ] Allen Wittenauer edited comment on YARN-3921 at 7/13/15 9:57 PM: - AFAIK, YARN won't change the permissions on the work dirs when you switch modes. The assumption is the ops folks/tools will handle this as part of the transition. amabari-qa's dir changing seems to be more related to something else (are these machines being managed via ambari and, like a naughty child, ambari is putting these where they don't belong?) given that you didn't say that a job belonging to the user ambari-qa job was run... was (Author: aw): AFAIK, YARN won't change the permissions on the work dirs when you switch modes. The assumption is the ops folks/tools will handle this as part of the transition. amabari-qa's dir changing seems to be more related to something else (are these machines being managed via ambari and, like a naughty child, ambari is putting these were they don't belong?) given that you didn't say that a job belonging to the user ambari-qa job was run... Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3
[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625438#comment-14625438 ] Allen Wittenauer commented on YARN-3921: AFAIK, YARN won't change the permissions on the work dirs when you switch modes. The assumption is the ops folks/tools will handle this as part of the transition. amabari-qa's dir changing seems to be more related to something else (are these machines being managed via ambari and, like a naughty child, ambari is putting these were they don't belong?) given that you didn't say that a job belonging to the user ambari-qa job was run... Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map
[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625448#comment-14625448 ] Zack Marsh commented on YARN-3921: -- Yes this cluster is being manager by Ambari, and yes there were jobs belonging to the user ambari-qa ran before and after enabling Kerberos. Given what you said, it sounds like the fix for this issue is just changing the ownersip on these usercache directories and the directories/files within to the appropriate user. Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by
[jira] [Resolved] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zack Marsh resolved YARN-3921. -- Resolution: Not A Problem Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by all map tasks=246524928 Total megabyte-seconds taken by all reduce tasks=41935872 Map-Reduce Framework Map input records=16 Map output records=32 Map output bytes=288 Map output materialized bytes=448 Input
[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625480#comment-14625480 ] Zack Marsh commented on YARN-3921: -- Okay, thanks for the responses. I've created AMBARI-12402 for this issue (I don't think I have the permissions to move this issue to the Ambari project). Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by all map tasks=246524928 Total megabyte-seconds taken by all reduce tasks=41935872 Map-Reduce Framework Map input
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625513#comment-14625513 ] MENG DING commented on YARN-3866: - * The FindBugs doesn't seem to be working in YARN-1197 branch (My own tests work fine though) * The checkstyle issues are false alarms. They are needed for Javadocs to work {code:title=ContainerResourceChangeRequest.java|borderStyle=solid} +import org.apache.hadoop.classification.InterfaceAudience.Public; +import org.apache.hadoop.classification.InterfaceStability.Unstable; +import org.apache.hadoop.yarn.api.ApplicationMasterProtocol;--- (Needed for Javadoc related method) +import org.apache.hadoop.yarn.util.Records; + +/** + * {@code ContainerResourceChangeRequest} represents the request made by an + * application to the {@code ResourceManager} to change resource allocation of + * a running {@code Container}. + * p + * It includes: + * ul + * li{@link ContainerId} for the container./li + * li + * {@link Resource} capability of the container after the resource change + * is completed. + * /li + * /ul + * + * @see ApplicationMasterProtocol#allocate(org.apache.hadoop.yarn.api.protocolrecords.AllocateRequest) (can't break this otherwise Javadoc won't work.) + */ {code} AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625396#comment-14625396 ] Jian He commented on YARN-3878: --- re-opened this, also reverted the previous patch. [~varun_saxena], could you upload a clean delta patch for this ? thanks ! AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000]
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625397#comment-14625397 ] Zhijie Shen commented on YARN-3908: --- Yeah, but the method based on metric value number is not guaranteed, are we okay with it? Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zack Marsh updated YARN-3921: - Description: Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by all map tasks=246524928 Total megabyte-seconds taken by all reduce tasks=41935872 Map-Reduce Framework Map input records=16 Map output records=32 Map output bytes=288 Map output materialized bytes=448 Input split bytes=2310 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=448 Reduce input records=32 Reduce output records=0 Spilled Records=64 Shuffled Maps =16 Failed Shuffles=0 Merged Map outputs=16 GC time elapsed (ms)=1501 CPU time spent (ms)=13670 Physical memory (bytes) snapshot=13480296448
[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625455#comment-14625455 ] Allen Wittenauer commented on YARN-3921: Yeah. In general when switching from non-K to K, people move to LinuxContainerExecutor first. When they do that, a general going over of all permissions is done first since things like temp dirs tend to require rwx+sticky. LCE is what is almost certainly forcing the permissions failures in your jobs. Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by all
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625459#comment-14625459 ] Jian He commented on YARN-1449: --- bq. because it may need to create the this.containersToIncrease if it is null. If this.containersToIncrease is null, wont this call return upfront ? {code} if (containersToIncrease == null) { return; } {code} bq. The AllocateRequestPBImpl.setAskList() uses the same logic. Actually, many places use this logic. No need to respect that. The PB implementations are often past-and-copy code that people pay less attention to it. AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3866) AM-RM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625486#comment-14625486 ] Hadoop QA commented on YARN-3866: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 48s | Findbugs (version ) appears to be broken on YARN-1197. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 7 new or modified test files. | | {color:green}+1{color} | javac | 7m 42s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 31s | The applied patch generated 2 new checkstyle issues (total was 22, now 15). | | {color:green}+1{color} | whitespace | 0m 8s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 23s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 16s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 5s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:green}+1{color} | yarn tests | 1m 53s | Tests passed in hadoop-yarn-common. | | | | 54m 1s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-mapreduce-client-app | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745125/YARN-3866-YARN-1197.4.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | YARN-1197 / 47f4c54 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8523/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8523/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8523/console | This message was automatically generated. AM-RM protocol changes to support container resizing Key: YARN-3866 URL: https://issues.apache.org/jira/browse/YARN-3866 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: MENG DING Assignee: MENG DING Attachments: YARN-3866-YARN-1197.4.patch, YARN-3866.1.patch, YARN-3866.2.patch, YARN-3866.3.patch YARN-1447 and YARN-1448 are outdated. This ticket deals with AM-RM Protocol changes to support container resize according to the latest design in YARN-1197. 1) Add increase/decrease requests in AllocateRequest 2) Get approved increase/decrease requests from RM in AllocateResponse 3) Add relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3916) DrainDispatcher#await should wait till event has been completely handled
[ https://issues.apache.org/jira/browse/YARN-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625382#comment-14625382 ] Jian He commented on YARN-3916: --- thanks [~varun_saxena] ! re-open YARN-3878 and close this as a dup of that. will look at YARN-3878. DrainDispatcher#await should wait till event has been completely handled Key: YARN-3916 URL: https://issues.apache.org/jira/browse/YARN-3916 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Attachments: YARN-3916.01.patch, YARN-3916.02.patch DrainDispatcher#await should wait till event has been completely handled. Currently it only checks for whether event queue has become empty. And in many tests we directly check for a state to be changed after calling await. Sometimes, the states do not change by the time we check them as event has not been completely handled. *This is causing test failures* such as YARN-3909 and YARN-3910 and may cause other test failures as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reopened YARN-3878: --- AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:113) at java.lang.Thread.run(Thread.java:744) main prio=10 tid=0x7fb98000a800 nid=0x49c3 in Object.wait() [0x7fb989851000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x000700b79430 (a java.lang.Object) at
[jira] [Commented] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625414#comment-14625414 ] Sangjin Lee commented on YARN-3908: --- When a time series data expires after the TTL (except for the latest value), it will only contain a single value. For all practical purposes, the metric at that point would act like a single value. We thought that it would be fine. Do you see a situation where (probably on the read path) we need to recognize some metric as a time series and do something different *even though* there is only one value in the column? Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3878) AsyncDispatcher can hang while stopping if it is configured for draining events on stop
[ https://issues.apache.org/jira/browse/YARN-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625436#comment-14625436 ] Hudson commented on YARN-3878: -- SUCCESS: Integrated in Hadoop-trunk-Commit #8157 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8157/]) Revert YARN-3878. AsyncDispatcher can hang while stopping if it is configured for draining events on stop. (Varun Saxena via kasha) (jianhe: rev 2466460d4cd13ad5837c044476b26e63082c1d37) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/TestAsyncDispatcher.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/event/DrainDispatcher.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/event/AsyncDispatcher.java AsyncDispatcher can hang while stopping if it is configured for draining events on stop --- Key: YARN-3878 URL: https://issues.apache.org/jira/browse/YARN-3878 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Varun Saxena Assignee: Varun Saxena Priority: Critical Fix For: 2.7.2 Attachments: YARN-3878-addendum.patch, YARN-3878.01.patch, YARN-3878.02.patch, YARN-3878.03.patch, YARN-3878.04.patch, YARN-3878.05.patch, YARN-3878.06.patch, YARN-3878.07.patch, YARN-3878.08.patch The sequence of events is as under : # RM is stopped while putting a RMStateStore Event to RMStateStore's AsyncDispatcher. This leads to an Interrupted Exception being thrown. # As RM is being stopped, RMStateStore's AsyncDispatcher is also stopped. On {{serviceStop}}, we will check if all events have been drained and wait for event queue to drain(as RM State Store dispatcher is configured for queue to drain on stop). # This condition never becomes true and AsyncDispatcher keeps on waiting incessantly for dispatcher event queue to drain till JVM exits. *Initial exception while posting RM State store event to queue* {noformat} 2015-06-27 20:08:35,922 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED 2015-06-27 20:08:35,923 WARN [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.updateApplicationAttemptState(RMStateStore.java:652) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.rememberTargetTransitionsAndStoreState(RMAppAttemptImpl.java:1173) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.access$3300(RMAppAttemptImpl.java:109) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1650) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$ContainerFinishedTransition.transition(RMAppAttemptImpl.java:1619) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:786) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:108) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:838) {noformat} *JStack of AsyncDispatcher hanging on stop* {noformat} AsyncDispatcher event handler prio=10 tid=0x7fb980222800 nid=0x4b1e waiting on condition [0x7fb9654e9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x000700b79250 (a
[jira] [Commented] (YARN-3921) Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625463#comment-14625463 ] Allen Wittenauer commented on YARN-3921: So, this is probably a bug in Ambari, really. Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster -- Key: YARN-3921 URL: https://issues.apache.org/jira/browse/YARN-3921 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Environment: sles11sp3 Reporter: Zack Marsh Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser': {code} iripiri1:~ # su tdatuser tdatuser@piripiri1:/root yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 1 Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/ 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/ 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=358 FILE: Number of bytes written=2249017 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4198 HDFS: Number of bytes written=215 HDFS: Number of read operations=67 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=16 Launched reduce tasks=1 Data-local map tasks=16 Total time spent by all maps in occupied slots (ms)=160498 Total time spent by all reduces in occupied slots (ms)=27302 Total time spent by all map tasks (ms)=80249 Total time spent by all reduce tasks (ms)=13651 Total vcore-seconds taken by all map tasks=80249 Total vcore-seconds taken by all reduce tasks=13651 Total megabyte-seconds taken by all map tasks=246524928 Total megabyte-seconds taken by all reduce tasks=41935872 Map-Reduce Framework Map input records=16 Map output records=32 Map output bytes=288
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625472#comment-14625472 ] MENG DING commented on YARN-1449: - Oh, the diff file doesn't show the entire context. The {{containersToIncrease}} refers to the parameter being passed in, so it is only in the scope of the {{setContainersToIncrease}} function. {code} + @Override + public void setContainersToIncrease(ListToken containersToIncrease) { +if (containersToIncrease == null) { + return; +} +initContainersToIncrease(); +this.containersToIncrease.clear(); +this.containersToIncrease.addAll(containersToIncrease); + } {code} AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1449) AM-NM protocol changes to support container resizing
[ https://issues.apache.org/jira/browse/YARN-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625479#comment-14625479 ] Jian He commented on YARN-1449: --- ah, I missed that too.. Found one example, I think we can use logic similar to ApplicationSubmissionContextPBImpl AM-NM protocol changes to support container resizing Key: YARN-1449 URL: https://issues.apache.org/jira/browse/YARN-1449 Project: Hadoop YARN Issue Type: Sub-task Components: api Reporter: Wangda Tan (No longer used) Assignee: MENG DING Attachments: YARN-1449.1.patch, YARN-1449.2.patch, YARN-1449.3.patch, yarn-1449.1.patch, yarn-1449.3.patch, yarn-1449.4.patch, yarn-1449.5.patch AM-NM protocol changes to support container resizing 1) IncreaseContainersResourceRequest and IncreaseContainersResourceResponse PB protocol and implementation 2) increaseContainersResources method in ContainerManagementProtocol 3) Update ContainerStatus protocol to include Resource 4) Relevant test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)