[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708470#comment-14708470 ] Rohith Sharma K S commented on YARN-3893: - I had closer look at either of the solutions as above. One of the potential issue in both are # Moving createAndInitService just before starting activeServices in transitionToActive. ## switch time will be impacted since every transitionToActive initializes active services. ## And RMWebApp has dependency on clienRMService for starting webapps. Without clientRMService initialization, RMWebapp can not be started. # Moving refreshAll before transitionToActive in adminService is same as triggering RMAdminCIi on standby node. This call throw StandByException and retried to active RM in RMAdminCli. When it comes to AdminService#transitionedToActive(), refreshing before {{rm.transitionedToActive}} throws an standby exception. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.1 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708471#comment-14708471 ] Rohith Sharma K S commented on YARN-3893: - I think for any configuration issues while transitioningToActive, Adminservice should not allow JVM to continue. Because if AdminService throws any exception back to elector, elector again try to make RM active which goes in loop forever filling the logs. There could be 2 calls can lead to point of failures i.e first {{rm.transitionedToActive}}, second {{refreshAll()}}. # If any failures in {{rm.transitionedToActive}} then RM services will be stopped and RM will be in STANDBY state. # If {{refreshAll()}} fails, BOTH RM will be in ACTIVE state as per this defect. Continuing RM services with invalid configuration does not good idea. Moreover invalid configurations should be notified to user immediately. So it would be better to make use of fail-fast configuration to exit the RM JVM. If this configuration is set to false , then call {{rm.handleTransitionToStandBy}}. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.1 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority
[ https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708462#comment-14708462 ] Naganarasimha G R commented on YARN-4068: - Thanks [~sunilg] for holding, Currently its not decided completely, it might be an existing jira or a new one, once confirmed will update you abt it. Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority - Key: YARN-4068 URL: https://issues.apache.org/jira/browse/YARN-4068 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sunil G Assignee: Sunil G YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to track and port appUpdated changes in V2 for - movetoqueue - updateAppPriority -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708425#comment-14708425 ] Bibin A Chundatt commented on YARN-3893: Hi [~rohithsharma] Any comments on this.? Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.1 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3717: Attachment: YARN-3717.20150822-1.patch attaching a initial patch with REST, CLI changes along with the WEB page changes Improve RM node labels web UI - Key: YARN-3717 URL: https://issues.apache.org/jira/browse/YARN-3717 Project: Hadoop YARN Issue Type: Sub-task Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: RMLogsForHungJob.log, YARN-3717.20150822-1.patch 1 Add the default-node-Label expression for each queue in scheduler page. 2 In Application/Appattempt page show the app configured node label expression for AM and Job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4071) After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error
Sunil G created YARN-4071: - Summary: After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error Key: YARN-4071 URL: https://issues.apache.org/jira/browse/YARN-4071 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Sunil G Assignee: Sunil G Priority: Critical {{ContainerLogsRetentionPolicy.java}} to be moved to {{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation}} and to be renamed as {{AllContainerLogAggregationPolicy.java}} after YARN-221 Now this file still remains in the yarn-common and causing build to break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4071) After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error
[ https://issues.apache.org/jira/browse/YARN-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4071: -- Attachment: 0001-YARN-4071.patch Attaching a patch. cc/[~mingma] and [~xgong] After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error Key: YARN-4071 URL: https://issues.apache.org/jira/browse/YARN-4071 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-4071.patch {{ContainerLogsRetentionPolicy.java}} to be moved to {{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation}} and to be renamed as {{AllContainerLogAggregationPolicy.java}} after YARN-221 Now this file still remains in the yarn-common and causing build to break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3697: Attachment: YARN-3697.001.patch FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. -- Key: YARN-3697 URL: https://issues.apache.org/jira/browse/YARN-3697 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3697.000.patch, YARN-3697.001.patch FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. The reason is because the InterruptedException is blocked in continuousSchedulingAttempt {code} try { if (node != null Resources.fitsIn(minimumAllocation, node.getAvailableResource())) { attemptScheduling(node); } } catch (Throwable ex) { LOG.error(Error while attempting scheduling for node + node + : + ex.toString(), ex); } {code} I saw the following exception after stop: {code} 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 available=memory:7168, vCores:7 used=memory:1024, vCores:1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at
[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()
[ https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708491#comment-14708491 ] Varun Saxena commented on YARN-3893: Using fail fast makes sense. Both RM in active state when Admin#transitionToActive failure from refeshAll() -- Key: YARN-3893 URL: https://issues.apache.org/jira/browse/YARN-3893 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.7.1 Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Critical Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml Cases that can cause this. # Capacity scheduler xml is wrongly configured during switch # Refresh ACL failure due to configuration # Refresh User group failure due to configuration Continuously both RM will try to be active {code} dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm1 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin ./yarn rmadmin -getServiceState rm2 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable active {code} # Both Web UI active # Status shown as active for both RM -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project
[ https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708467#comment-14708467 ] Naganarasimha G R commented on YARN-4058: - Thanks [~sjlee0] for the review comments, Yes it makes sense to move ApplicationACLsManager related modifications in trunk itself as t might involve more changes if we touch constructor. bq. This style (having values like null first and then the expression) is a carry-over from the C-style defensive coding, and is no longer needed in java. Ok, Will take care of it in the next patch. bq. Then the timeline client must be stopped properly as part of lifecycle management. True, we need to handle this too, will take care of it in YARN-3367. For where to do it; As part of initial thoughts i was planning to do it when ApplicationImpl(NM class) reaches {{ApplicationState.FINISH_APPLICATION}}. Miscellaneous issues in NodeManager project --- Key: YARN-4058 URL: https://issues.apache.org/jira/browse/YARN-4058 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor Attachments: YARN-4058.YARN-2928.001.patch # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing # Unused ApplicationACLsManager in ContainerManagerImpl # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is created and then checked whether it exists in context.getApplications(). everytime ApplicationImpl is created state machine is intialized and TimelineClient is created which is required only if added to the context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3697: Attachment: YARN-3697.001.patch FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. -- Key: YARN-3697 URL: https://issues.apache.org/jira/browse/YARN-3697 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3697.000.patch, YARN-3697.001.patch FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. The reason is because the InterruptedException is blocked in continuousSchedulingAttempt {code} try { if (node != null Resources.fitsIn(minimumAllocation, node.getAvailableResource())) { attemptScheduling(node); } } catch (Throwable ex) { LOG.error(Error while attempting scheduling for node + node + : + ex.toString(), ex); } {code} I saw the following exception after stop: {code} 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 available=memory:7168, vCores:7 used=memory:1024, vCores:1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at
[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-3697: Attachment: (was: YARN-3697.001.patch) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. -- Key: YARN-3697 URL: https://issues.apache.org/jira/browse/YARN-3697 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3697.000.patch, YARN-3697.001.patch FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. The reason is because the InterruptedException is blocked in continuousSchedulingAttempt {code} try { if (node != null Resources.fitsIn(minimumAllocation, node.getAvailableResource())) { attemptScheduling(node); } } catch (Throwable ex) { LOG.error(Error while attempting scheduling for node + node + : + ex.toString(), ex); } {code} I saw the following exception after stop: {code} 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 available=memory:7168, vCores:7 used=memory:1024, vCores:1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at
[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.
[ https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708310#comment-14708310 ] zhihai xu commented on YARN-3697: - Hi [~kasha], thanks for the review! That is a good suggestion, I attached a new patch YARN-3697.001.patch which addressed your comments with two tests. Please review it. thanks again! FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. -- Key: YARN-3697 URL: https://issues.apache.org/jira/browse/YARN-3697 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Attachments: YARN-3697.000.patch, YARN-3697.001.patch FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes. The reason is because the InterruptedException is blocked in continuousSchedulingAttempt {code} try { if (node != null Resources.fitsIn(minimumAllocation, node.getAvailableResource())) { attemptScheduling(node); } } catch (Throwable ex) { LOG.error(Error while attempting scheduling for node + node + : + ex.toString(), ex); } {code} I saw the following exception after stop: {code} 2015-05-17 23:30:43,065 WARN [FairSchedulerContinuousScheduling] event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher thread interrupted java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338) at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387) at org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285) 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 available=memory:7168, vCores:7 used=memory:1024, vCores:1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249) at
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708318#comment-14708318 ] Arun Suresh commented on YARN-221: -- Looks like trunk does not compile correctly after this.. NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
[ https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4024: -- Attachment: YARN-4024-v7.patch Thanks for your comments, [~adhoot], updated the patch. YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat -- Key: YARN-4024 URL: https://issues.apache.org/jira/browse/YARN-4024 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Reporter: Wangda Tan Assignee: Hong Zhiguo Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, YARN-4024-v6.patch, YARN-4024-v7.patch Currently, YARN RM NodesListManager will resolve IP address every time when node doing heartbeat. When DNS server becomes slow, NM heartbeat will be blocked and cannot make progress. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708582#comment-14708582 ] Ming Ma commented on YARN-221: -- +1 on the addendum patch. NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708630#comment-14708630 ] Hudson commented on YARN-221: - SUCCESS: Integrated in Hadoop-Yarn-trunk #1030 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1030/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4071) After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error
[ https://issues.apache.org/jira/browse/YARN-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708628#comment-14708628 ] Brahma Reddy Battula commented on YARN-4071: [~sunilg] thanks for reporting this...[~xgong] had given YARN-221-addendum.1.patch and it's committed.. Now we can close this jira.. After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error Key: YARN-4071 URL: https://issues.apache.org/jira/browse/YARN-4071 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Sunil G Assignee: Sunil G Priority: Critical Attachments: 0001-YARN-4071.patch {{ContainerLogsRetentionPolicy.java}} to be moved to {{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation}} and to be renamed as {{AllContainerLogAggregationPolicy.java}} after YARN-221 Now this file still remains in the yarn-common and causing build to break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708596#comment-14708596 ] Xuan Gong commented on YARN-221: Committed the addendum patch into trunk/branch-2. Thanks for the review, Ming NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reopened YARN-221: NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708566#comment-14708566 ] Xuan Gong commented on YARN-221: Yes, reopen this and attach a addendum patch to fix the compilation issue NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-221: --- Attachment: YARN-221-addendum.1.patch NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708594#comment-14708594 ] Hudson commented on YARN-221: - FAILURE: Integrated in Hadoop-trunk-Commit #8341 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8341/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708660#comment-14708660 ] Hudson commented on YARN-221: - SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #301 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/301/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708734#comment-14708734 ] Hudson commented on YARN-221: - FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #289 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/289/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3717) Improve RM node labels web UI
[ https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708816#comment-14708816 ] Hadoop QA commented on YARN-3717: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 24m 16s | Pre-patch trunk has 7 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 45s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | site | 3m 1s | Site still builds. | | {color:red}-1{color} | checkstyle | 2m 37s | The applied patch generated 3 new checkstyle issues (total was 16, now 18). | | {color:red}-1{color} | whitespace | 0m 5s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 7m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | yarn tests | 0m 23s | Tests passed in hadoop-yarn-api. | | {color:red}-1{color} | yarn tests | 3m 23s | Tests failed in hadoop-yarn-client. | | {color:red}-1{color} | yarn tests | 0m 19s | Tests failed in hadoop-yarn-common. | | {color:red}-1{color} | yarn tests | 0m 13s | Tests failed in hadoop-yarn-server-applicationhistoryservice. | | {color:red}-1{color} | yarn tests | 0m 14s | Tests failed in hadoop-yarn-server-common. | | {color:red}-1{color} | yarn tests | 0m 20s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 63m 2s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.client.TestResourceTrackerOnHA | | | hadoop.yarn.client.api.impl.TestYarnClient | | | hadoop.yarn.client.api.impl.TestAMRMClient | | | hadoop.yarn.client.api.impl.TestNMClient | | | hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA | | | hadoop.yarn.client.cli.TestYarnCLI | | Timed out tests | org.apache.hadoop.yarn.client.TestRMFailover | | Failed build | hadoop-yarn-common | | | hadoop-yarn-server-applicationhistoryservice | | | hadoop-yarn-server-common | | | hadoop-yarn-server-resourcemanager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751920/YARN-3717.20150822-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / b71c600 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/whitespace.txt | | hadoop-yarn-api test log | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-api.txt | | hadoop-yarn-client test log | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-client.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | hadoop-yarn-server-applicationhistoryservice test log | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt | | hadoop-yarn-server-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8895/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8895/console | This message was automatically generated. Improve RM node labels web UI - Key: YARN-3717 URL: https://issues.apache.org/jira/browse/YARN-3717 Project: Hadoop YARN
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708800#comment-14708800 ] Hadoop QA commented on YARN-3896: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 10s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 8s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 16s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | tools/hadoop tests | 0m 52s | Tests passed in hadoop-sls. | | {color:red}-1{color} | yarn tests | 51m 13s | Tests failed in hadoop-yarn-server-resourcemanager. | | | | 92m 42s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12751826/YARN-3896.07.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / b71c600 | | hadoop-sls test log | https://builds.apache.org/job/PreCommit-YARN-Build/8894/artifact/patchprocess/testrun_hadoop-sls.txt | | hadoop-yarn-server-resourcemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/8894/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8894/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8894/console | This message was automatically generated. RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset --- Key: YARN-3896 URL: https://issues.apache.org/jira/browse/YARN-3896 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch {noformat} 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 10.208.132.153 to /default-rack 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Reconnect from the node at: 10.208.132.153 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 10.208.132.153:8041 2015-07-03 16:49:39,104 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far behind rm response id:2506413 nm response id:0 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node 10.208.132.153:8041 as it is now REBOOTED 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED {noformat} The node(10.208.132.153) reconnected with RM. When it registered with RM, RM set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset
[ https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708810#comment-14708810 ] Rohith Sharma K S commented on YARN-3896: - Test failures are unrelated to the patch.. committing shortly.. RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset --- Key: YARN-3896 URL: https://issues.apache.org/jira/browse/YARN-3896 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jun Gong Assignee: Jun Gong Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch {noformat} 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved 10.208.132.153 to /default-rack 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Reconnect from the node at: 10.208.132.153 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 10.208.132.153:8041 2015-07-03 16:49:39,104 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far behind rm response id:2506413 nm response id:0 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating Node 10.208.132.153:8041 as it is now REBOOTED 2015-07-03 16:49:39,137 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED {noformat} The node(10.208.132.153) reconnected with RM. When it registered with RM, RM set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's heartbeat come before RM succeeded setting the id to 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708694#comment-14708694 ] Hudson commented on YARN-221: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #297 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/297/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708743#comment-14708743 ] Hudson commented on YARN-221: - FAILURE: Integrated in Hadoop-Hdfs-trunk #2227 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2227/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.
[ https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708716#comment-14708716 ] Hudson commented on YARN-221: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2246 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2246/]) YARN-221. Addendum patch to compilation issue which is caused by missing (xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java NM should provide a way for AM to tell it not to aggregate logs. Key: YARN-221 URL: https://issues.apache.org/jira/browse/YARN-221 Project: Hadoop YARN Issue Type: Sub-task Components: log-aggregation, nodemanager Reporter: Robert Joseph Evans Assignee: Ming Ma Fix For: 2.8.0 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch The NodeManager should provide a way for an AM to tell it that either the logs should not be aggregated, that they should be aggregated with a high priority, or that they should be aggregated but with a lower priority. The AM should be able to do this in the ContainerLaunch context to provide a default value, but should also be able to update the value when the container is released. This would allow for the NM to not aggregate logs in some cases, and avoid connection to the NN at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)