[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708470#comment-14708470
 ] 

Rohith Sharma K S commented on YARN-3893:
-

I had closer look at either of the solutions as above. One of the potential 
issue in both are
# Moving createAndInitService just before starting activeServices in 
transitionToActive. 
## switch time will be impacted since every transitionToActive initializes 
active services.
## And RMWebApp has dependency on clienRMService for starting webapps. Without 
clientRMService initialization, RMWebapp can not be started. 
# Moving refreshAll before transitionToActive in adminService is same as 
triggering RMAdminCIi on standby node. This call throw StandByException and 
retried to active RM in RMAdminCli. When it comes to  
AdminService#transitionedToActive(), refreshing before 
{{rm.transitionedToActive}} throws an standby exception. 



 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708471#comment-14708471
 ] 

Rohith Sharma K S commented on YARN-3893:
-

I think for any configuration issues while transitioningToActive, Adminservice 
should not allow JVM to continue. Because if AdminService throws any exception 
back to elector, elector again try to make RM active which goes in loop forever 
filling the logs. 
There could be 2 calls can lead to point of failures i.e first 
{{rm.transitionedToActive}}, second {{refreshAll()}}. 
# If any failures in {{rm.transitionedToActive}} then RM services will be 
stopped and RM will be in STANDBY state.
# If {{refreshAll()}} fails, BOTH RM will be in ACTIVE state as per this 
defect. Continuing RM services with invalid configuration does not good idea. 
Moreover invalid configurations should be notified to user immediately. So it 
would be better to make use of fail-fast configuration to exit the RM JVM. If 
this configuration is set to false , then call {{rm.handleTransitionToStandBy}}.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4068) Support appUpdated event in TimelineV2 to publish details for movetoqueue, change in priority

2015-08-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708462#comment-14708462
 ] 

Naganarasimha G R commented on YARN-4068:
-

Thanks [~sunilg] for holding,
Currently its not decided completely, it might be an existing jira or a new 
one, once confirmed will update you abt it.


 Support appUpdated event in TimelineV2 to publish details for movetoqueue, 
 change in priority
 -

 Key: YARN-4068
 URL: https://issues.apache.org/jira/browse/YARN-4068
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sunil G
Assignee: Sunil G

 YARN-4044 supports appUpdated event changes to TimelineV1. This jira is to 
 track and port appUpdated changes in V2 for
 - movetoqueue
 - updateAppPriority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708425#comment-14708425
 ] 

Bibin A Chundatt commented on YARN-3893:


Hi [~rohithsharma] 
Any comments on this.? 


 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-23 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150822-1.patch

attaching a initial patch with REST, CLI changes along with the WEB page changes

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: RMLogsForHungJob.log, YARN-3717.20150822-1.patch


 1 Add the default-node-Label expression for each queue in scheduler page.
 2 In Application/Appattempt page  show the app configured node label 
 expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4071) After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error

2015-08-23 Thread Sunil G (JIRA)
Sunil G created YARN-4071:
-

 Summary: After YARN-221, ContainerLogsRetentionPolicy.java still 
remains in yarn-common causing compilation error
 Key: YARN-4071
 URL: https://issues.apache.org/jira/browse/YARN-4071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical


{{ContainerLogsRetentionPolicy.java}} to be moved to 
{{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation}} 
and to be renamed as {{AllContainerLogAggregationPolicy.java}} after YARN-221

Now this file still remains in the yarn-common and causing build to break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4071) After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error

2015-08-23 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4071:
--
Attachment: 0001-YARN-4071.patch

Attaching a patch. 
cc/[~mingma] and [~xgong]

 After YARN-221, ContainerLogsRetentionPolicy.java still remains in 
 yarn-common causing compilation error
 

 Key: YARN-4071
 URL: https://issues.apache.org/jira/browse/YARN-4071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical
 Attachments: 0001-YARN-4071.patch


 {{ContainerLogsRetentionPolicy.java}} to be moved to 
 {{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation}} 
 and to be renamed as {{AllContainerLogAggregationPolicy.java}} after YARN-221
 Now this file still remains in the yarn-common and causing build to break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.

2015-08-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3697:

Attachment: YARN-3697.001.patch

 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 --

 Key: YARN-3697
 URL: https://issues.apache.org/jira/browse/YARN-3697
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3697.000.patch, YARN-3697.001.patch


 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 The reason is because the InterruptedException is blocked in 
 continuousSchedulingAttempt
 {code}
   try {
 if (node != null  Resources.fitsIn(minimumAllocation,
 node.getAvailableResource())) {
   attemptScheduling(node);
 }
   } catch (Throwable ex) {
 LOG.error(Error while attempting scheduling for node  + node +
 :  + ex.toString(), ex);
   }
 {code}
 I saw the following exception after stop:
 {code}
 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
 fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
 Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
 available=memory:7168, vCores:7 used=memory:1024, vCores:1: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-23 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708491#comment-14708491
 ] 

Varun Saxena commented on YARN-3893:


Using fail fast makes sense.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.7.1
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4058) Miscellaneous issues in NodeManager project

2015-08-23 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708467#comment-14708467
 ] 

Naganarasimha G R commented on YARN-4058:
-

Thanks [~sjlee0] for the review comments, 
Yes it makes sense to move ApplicationACLsManager related modifications in 
trunk itself as t might involve more changes if we touch constructor. 
bq. This style (having values like null first and then the expression) is a 
carry-over from the C-style defensive coding, and is no longer needed in java.
Ok, Will take care of it in the next patch.
bq. Then the timeline client must be stopped properly as part of lifecycle 
management.
True, we need to handle this too, will take care of it in YARN-3367. For where 
to do it;  As part of initial thoughts i was planning to do it when 
ApplicationImpl(NM class) reaches {{ApplicationState.FINISH_APPLICATION}}.

 Miscellaneous issues in NodeManager project
 ---

 Key: YARN-4058
 URL: https://issues.apache.org/jira/browse/YARN-4058
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Minor
 Attachments: YARN-4058.YARN-2928.001.patch


 # TestSystemMetricsPublisherForV2.testPublishApplicationMetrics is failing 
 # Unused ApplicationACLsManager in ContainerManagerImpl
 # In ContainerManagerImpl.startContainerInternal ApplicationImpl instance is 
 created and then checked whether it exists in context.getApplications(). 
 everytime ApplicationImpl is created state machine is intialized and 
 TimelineClient is created which is required only if added to the context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.

2015-08-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3697:

Attachment: YARN-3697.001.patch

 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 --

 Key: YARN-3697
 URL: https://issues.apache.org/jira/browse/YARN-3697
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3697.000.patch, YARN-3697.001.patch


 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 The reason is because the InterruptedException is blocked in 
 continuousSchedulingAttempt
 {code}
   try {
 if (node != null  Resources.fitsIn(minimumAllocation,
 node.getAvailableResource())) {
   attemptScheduling(node);
 }
   } catch (Throwable ex) {
 LOG.error(Error while attempting scheduling for node  + node +
 :  + ex.toString(), ex);
   }
 {code}
 I saw the following exception after stop:
 {code}
 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
 fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
 Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
 available=memory:7168, vCores:7 used=memory:1024, vCores:1: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 

[jira] [Updated] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.

2015-08-23 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-3697:

Attachment: (was: YARN-3697.001.patch)

 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 --

 Key: YARN-3697
 URL: https://issues.apache.org/jira/browse/YARN-3697
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3697.000.patch, YARN-3697.001.patch


 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 The reason is because the InterruptedException is blocked in 
 continuousSchedulingAttempt
 {code}
   try {
 if (node != null  Resources.fitsIn(minimumAllocation,
 node.getAvailableResource())) {
   attemptScheduling(node);
 }
   } catch (Throwable ex) {
 LOG.error(Error while attempting scheduling for node  + node +
 :  + ex.toString(), ex);
   }
 {code}
 I saw the following exception after stop:
 {code}
 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
 fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
 Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
 available=memory:7168, vCores:7 used=memory:1024, vCores:1: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 

[jira] [Commented] (YARN-3697) FairScheduler: ContinuousSchedulingThread can't be shutdown after stop sometimes.

2015-08-23 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708310#comment-14708310
 ] 

zhihai xu commented on YARN-3697:
-

Hi [~kasha], thanks for the review! That is a good suggestion, I attached a new 
patch YARN-3697.001.patch which addressed your comments with two tests. Please 
review it. thanks again!

 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 --

 Key: YARN-3697
 URL: https://issues.apache.org/jira/browse/YARN-3697
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Attachments: YARN-3697.000.patch, YARN-3697.001.patch


 FairScheduler: ContinuousSchedulingThread can't be shutdown after stop 
 sometimes. 
 The reason is because the InterruptedException is blocked in 
 continuousSchedulingAttempt
 {code}
   try {
 if (node != null  Resources.fitsIn(minimumAllocation,
 node.getAvailableResource())) {
   attemptScheduling(node);
 }
   } catch (Throwable ex) {
 LOG.error(Error while attempting scheduling for node  + node +
 :  + ex.toString(), ex);
   }
 {code}
 I saw the following exception after stop:
 {code}
 2015-05-17 23:30:43,065 WARN  [FairSchedulerContinuousScheduling] 
 event.AsyncDispatcher (AsyncDispatcher.java:handle(247)) - AsyncDispatcher 
 thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
   at 
 java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
   at 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:244)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:467)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl$ContainerStartedTransition.transition(RMContainerImpl.java:462)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:387)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl.handle(RMContainerImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.allocate(FSAppAttempt.java:357)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:516)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:649)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.assignContainer(FSAppAttempt.java:803)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:334)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:173)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.attemptScheduling(FairScheduler.java:1082)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.continuousSchedulingAttempt(FairScheduler.java:1014)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:285)
 2015-05-17 23:30:43,066 ERROR [FairSchedulerContinuousScheduling] 
 fair.FairScheduler (FairScheduler.java:continuousSchedulingAttempt(1017)) - 
 Error while attempting scheduling for node host: 127.0.0.2:2 #containers=1 
 available=memory:7168, vCores:7 used=memory:1024, vCores:1: 
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.InterruptedException
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:249)
   at 
 

[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708318#comment-14708318
 ] 

Arun Suresh commented on YARN-221:
--

Looks like trunk does not compile correctly after this..

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, 
 YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4024) YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat

2015-08-23 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4024:
--
Attachment: YARN-4024-v7.patch

Thanks for your comments, [~adhoot], updated the patch.

 YARN RM should avoid unnecessary resolving IP when NMs doing heartbeat
 --

 Key: YARN-4024
 URL: https://issues.apache.org/jira/browse/YARN-4024
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Hong Zhiguo
 Attachments: YARN-4024-draft-v2.patch, YARN-4024-draft-v3.patch, 
 YARN-4024-draft.patch, YARN-4024-v4.patch, YARN-4024-v5.patch, 
 YARN-4024-v6.patch, YARN-4024-v7.patch


 Currently, YARN RM NodesListManager will resolve IP address every time when 
 node doing heartbeat. When DNS server becomes slow, NM heartbeat will be 
 blocked and cannot make progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708582#comment-14708582
 ] 

Ming Ma commented on YARN-221:
--

+1 on the addendum patch.

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708630#comment-14708630
 ] 

Hudson commented on YARN-221:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #1030 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1030/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4071) After YARN-221, ContainerLogsRetentionPolicy.java still remains in yarn-common causing compilation error

2015-08-23 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708628#comment-14708628
 ] 

Brahma Reddy Battula commented on YARN-4071:


[~sunilg] thanks for reporting this...[~xgong] had given 
YARN-221-addendum.1.patch and it's committed.. Now we can close this jira..



 After YARN-221, ContainerLogsRetentionPolicy.java still remains in 
 yarn-common causing compilation error
 

 Key: YARN-4071
 URL: https://issues.apache.org/jira/browse/YARN-4071
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Sunil G
Assignee: Sunil G
Priority: Critical
 Attachments: 0001-YARN-4071.patch


 {{ContainerLogsRetentionPolicy.java}} to be moved to 
 {{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation}} 
 and to be renamed as {{AllContainerLogAggregationPolicy.java}} after YARN-221
 Now this file still remains in the yarn-common and causing build to break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708596#comment-14708596
 ] 

Xuan Gong commented on YARN-221:


Committed the addendum patch into trunk/branch-2. Thanks for the review, Ming

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong reopened YARN-221:


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, 
 YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708566#comment-14708566
 ] 

Xuan Gong commented on YARN-221:


Yes, reopen this and attach a addendum patch to fix the compilation issue

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-trunk-v1.patch, YARN-221-trunk-v2.patch, 
 YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-221:
---
Attachment: YARN-221-addendum.1.patch

 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708594#comment-14708594
 ] 

Hudson commented on YARN-221:
-

FAILURE: Integrated in Hadoop-trunk-Commit #8341 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8341/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708660#comment-14708660
 ] 

Hudson commented on YARN-221:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #301 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/301/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708734#comment-14708734
 ] 

Hudson commented on YARN-221:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #289 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/289/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708816#comment-14708816
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  24m 16s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 40s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 45s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  1s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 37s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:red}-1{color} | whitespace |   0m  5s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 21s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:red}-1{color} | yarn tests |   3m 23s | Tests failed in 
hadoop-yarn-client. |
| {color:red}-1{color} | yarn tests |   0m 19s | Tests failed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |   0m 13s | Tests failed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:red}-1{color} | yarn tests |   0m 14s | Tests failed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |   0m 20s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  63m  2s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.client.TestResourceTrackerOnHA |
|   | hadoop.yarn.client.api.impl.TestYarnClient |
|   | hadoop.yarn.client.api.impl.TestAMRMClient |
|   | hadoop.yarn.client.api.impl.TestNMClient |
|   | hadoop.yarn.client.TestApplicationMasterServiceProtocolOnHA |
|   | hadoop.yarn.client.cli.TestYarnCLI |
| Timed out tests | org.apache.hadoop.yarn.client.TestRMFailover |
| Failed build | hadoop-yarn-common |
|   | hadoop-yarn-server-applicationhistoryservice |
|   | hadoop-yarn-server-common |
|   | hadoop-yarn-server-resourcemanager |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751920/YARN-3717.20150822-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / b71c600 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8895/console |


This message was automatically generated.

 Improve RM node labels web UI
 -

 Key: YARN-3717
 URL: https://issues.apache.org/jira/browse/YARN-3717
 Project: Hadoop YARN
 

[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708800#comment-14708800
 ] 

Hadoop QA commented on YARN-3896:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 53s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  8s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 16s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   0m 52s | Tests passed in 
hadoop-sls. |
| {color:red}-1{color} | yarn tests |  51m 13s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 42s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751826/YARN-3896.07.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / b71c600 |
| hadoop-sls test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/artifact/patchprocess/testrun_hadoop-sls.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8894/console |


This message was automatically generated.

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3896) RMNode transitioned from RUNNING to REBOOTED because its response id had not been reset

2015-08-23 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708810#comment-14708810
 ] 

Rohith Sharma K S commented on YARN-3896:
-

Test failures are unrelated to the patch.. committing shortly..

 RMNode transitioned from RUNNING to REBOOTED because its response id had not 
 been reset
 ---

 Key: YARN-3896
 URL: https://issues.apache.org/jira/browse/YARN-3896
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Jun Gong
Assignee: Jun Gong
 Attachments: 0001-YARN-3896.patch, YARN-3896.01.patch, 
 YARN-3896.02.patch, YARN-3896.03.patch, YARN-3896.04.patch, 
 YARN-3896.05.patch, YARN-3896.06.patch, YARN-3896.07.patch


 {noformat}
 2015-07-03 16:49:39,075 INFO org.apache.hadoop.yarn.util.RackResolver: 
 Resolved 10.208.132.153 to /default-rack
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 Reconnect from the node at: 10.208.132.153
 2015-07-03 16:49:39,075 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: 
 NodeManager from node 10.208.132.153(cmPort: 8041 httpPort: 8080) registered 
 with capability: memory:6144, vCores:60, diskCapacity:213, assigned nodeId 
 10.208.132.153:8041
 2015-07-03 16:49:39,104 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Too far 
 behind rm response id:2506413 nm response id:0
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Deactivating 
 Node 10.208.132.153:8041 as it is now REBOOTED
 2015-07-03 16:49:39,137 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: 
 10.208.132.153:8041 Node Transitioned from RUNNING to REBOOTED
 {noformat}
 The node(10.208.132.153) reconnected with RM. When it registered with RM, RM 
 set its lastNodeHeartbeatResponse's id to 0 asynchronously. But the node's 
 heartbeat come before RM succeeded setting the id to 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708694#comment-14708694
 ] 

Hudson commented on YARN-221:
-

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #297 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/297/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708743#comment-14708743
 ] 

Hudson commented on YARN-221:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #2227 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2227/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-221) NM should provide a way for AM to tell it not to aggregate logs.

2015-08-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708716#comment-14708716
 ] 

Hudson commented on YARN-221:
-

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2246 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2246/])
YARN-221. Addendum patch to compilation issue which is caused by missing 
(xgong: rev b71c6006f579ac6f0755975a9b908b0062618b46)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/ContainerLogsRetentionPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AllContainerLogAggregationPolicy.java


 NM should provide a way for AM to tell it not to aggregate logs.
 

 Key: YARN-221
 URL: https://issues.apache.org/jira/browse/YARN-221
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: log-aggregation, nodemanager
Reporter: Robert Joseph Evans
Assignee: Ming Ma
 Fix For: 2.8.0

 Attachments: YARN-221-6.patch, YARN-221-7.patch, YARN-221-8.patch, 
 YARN-221-9.patch, YARN-221-addendum.1.patch, YARN-221-trunk-v1.patch, 
 YARN-221-trunk-v2.patch, YARN-221-trunk-v3.patch, YARN-221-trunk-v4.patch, 
 YARN-221-trunk-v5.patch


 The NodeManager should provide a way for an AM to tell it that either the 
 logs should not be aggregated, that they should be aggregated with a high 
 priority, or that they should be aggregated but with a lower priority.  The 
 AM should be able to do this in the ContainerLaunch context to provide a 
 default value, but should also be able to update the value when the container 
 is released.
 This would allow for the NM to not aggregate logs in some cases, and avoid 
 connection to the NN at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)