[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-14 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212015#comment-14212015
 ] 

Sandy Ryza commented on YARN-2811:
--

This looks almost good to go - the last thing is that we should use 
Resources.fitsIn instead of Resources.lessThanOrEqual(RESOURCE_CALCULATOR...), 
as the latter will only consider memory.

 Fair Scheduler is violating max memory settings in 2.4
 --

 Key: YARN-2811
 URL: https://issues.apache.org/jira/browse/YARN-2811
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
 YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, 
 YARN-2811.v6.patch, YARN-2811.v7.patch


 This has been seen on several queues showing the allocated MB going 
 significantly above the max MB and it appears to have started with the 2.4 
 upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)
yangping wu created YARN-2863:
-

 Summary: ResourceManager will shutdown when job's queue is empty
 Key: YARN-2863
 URL: https://issues.apache.org/jira/browse/YARN-2863
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: yangping wu


When I submit a job to hadoop cluster, but don't specified a queuename as follow

 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 
 and if yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException

2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow

 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 
 and if yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 ResourceManager will shutdown when 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if {code}yarn.scheduler.fair.allow-undeclared-pools{code} is not overwrite 
by user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow

 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 
 and if yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow

 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 
 and if yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException

2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..


 ResourceManager will shutdown when job's queue is empty
 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if {yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if {code}yarn.scheduler.fair.allow-undeclared-pools{code} is not overwrite 
by user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if {yarn.scheduler.fair.allow-undeclared-pools is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 ResourceManager 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .Then it will throw 
MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, but I didn't set the mapreduce.job.queuename property. Then 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 ResourceManager will 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queue is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .But  this  will 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .Then it will throw 
MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 ResourceManager will 

[jira] [Commented] (YARN-2603) ApplicationConstants missing HADOOP_MAPRED_HOME

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212162#comment-14212162
 ] 

Hudson commented on YARN-2603:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/])
Revert YARN-2603. ApplicationConstants missing HADOOP_MAPRED_HOME (Ray Chiang 
via aw) (vinodkv: rev 4ae9780e6a05bfd6b93f1c871c22761ddd8b19cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationConstants missing HADOOP_MAPRED_HOME
 ---

 Key: YARN-2603
 URL: https://issues.apache.org/jira/browse/YARN-2603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Ray Chiang
  Labels: newbie
 Attachments: YARN-2603-01.patch


 The Environment enum should have HADOOP_MAPRED_HOME listed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212156#comment-14212156
 ] 

Hudson commented on YARN-2846:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/])
YARN-2846. Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart. Contributed by 
Junping Du (jlowe: rev 33ea5ae92b9dd3abace104903d9a94d17dd75af5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java


 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2846-demo.patch, YARN-2846.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 

[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212167#comment-14212167
 ] 

Hudson commented on YARN-2766:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/])
YARN-2766. Made ApplicationHistoryManager return a sorted list of apps, 
attempts and containers. Contributed by Robert Kanter. (zjshen: rev 
3648cb57c9f018a3a339c26f5a0ca2779485521a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.7.0

 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch, 
 YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212166#comment-14212166
 ] 

Hudson commented on YARN-2853:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/])
YARN-2853. Fixed a bug in ResourceManager causing apps to hang when the user 
kill request races with ApplicationMaster finish. Contributed by Jian He. 
(vinodkv: rev 3651fe1b089851b38be351c00a9899817166bf3e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
YARN-2853. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
d648e60ebab7f1942dba92e9cd2cb62b8d70419b)
* hadoop-yarn-project/CHANGES.txt


 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212158#comment-14212158
 ] 

Hudson commented on YARN-2635:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/])
YARN-2635. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
81dc0ac6dcf2f34ad607da815ea0144f178691a9)
* hadoop-yarn-project/CHANGES.txt


 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212157#comment-14212157
 ] 

Hudson commented on YARN-2856:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/])
YARN-2856. Fixed RMAppImpl to handle ATTEMPT_KILLED event at ACCEPTED state on 
app recovery. Contributed by Rohith Sharmaks (jianhe: rev 
d005404ef7211fe96ce1801ed267a249568540fd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212195#comment-14212195
 ] 

Hudson commented on YARN-2853:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #743 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/743/])
YARN-2853. Fixed a bug in ResourceManager causing apps to hang when the user 
kill request races with ApplicationMaster finish. Contributed by Jian He. 
(vinodkv: rev 3651fe1b089851b38be351c00a9899817166bf3e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
YARN-2853. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
d648e60ebab7f1942dba92e9cd2cb62b8d70419b)
* hadoop-yarn-project/CHANGES.txt


 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212187#comment-14212187
 ] 

Hudson commented on YARN-2635:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #743 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/743/])
YARN-2635. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
81dc0ac6dcf2f34ad607da815ea0144f178691a9)
* hadoop-yarn-project/CHANGES.txt


 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212196#comment-14212196
 ] 

Hudson commented on YARN-2766:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #743 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/743/])
YARN-2766. Made ApplicationHistoryManager return a sorted list of apps, 
attempts and containers. Contributed by Robert Kanter. (zjshen: rev 
3648cb57c9f018a3a339c26f5a0ca2779485521a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.7.0

 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch, 
 YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212185#comment-14212185
 ] 

Hudson commented on YARN-2846:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #743 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/743/])
YARN-2846. Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart. Contributed by 
Junping Du (jlowe: rev 33ea5ae92b9dd3abace104903d9a94d17dd75af5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt


 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2846-demo.patch, YARN-2846.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)
  

[jira] [Commented] (YARN-2603) ApplicationConstants missing HADOOP_MAPRED_HOME

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212191#comment-14212191
 ] 

Hudson commented on YARN-2603:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #743 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/743/])
Revert YARN-2603. ApplicationConstants missing HADOOP_MAPRED_HOME (Ray Chiang 
via aw) (vinodkv: rev 4ae9780e6a05bfd6b93f1c871c22761ddd8b19cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationConstants missing HADOOP_MAPRED_HOME
 ---

 Key: YARN-2603
 URL: https://issues.apache.org/jira/browse/YARN-2603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Ray Chiang
  Labels: newbie
 Attachments: YARN-2603-01.patch


 The Environment enum should have HADOOP_MAPRED_HOME listed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212186#comment-14212186
 ] 

Hudson commented on YARN-2856:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #743 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/743/])
YARN-2856. Fixed RMAppImpl to handle ATTEMPT_KILLED event at ACCEPTED state on 
app recovery. Contributed by Rohith Sharmaks (jianhe: rev 
d005404ef7211fe96ce1801ed267a249568540fd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java


 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queuename is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.iteblog.Sts 
-Dmapreduce.job.queuename=   
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .But  this  will 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts -Dmapreduce.job.queuename= 
  
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .But  this  will 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 ResourceManager 

[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queuename is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Summary: ResourceManager will shutdown when job's queuename is empty  (was: 
ResourceManager will shutdown when job's queue is empty)

 ResourceManager will shutdown when job's queuename is empty
 ---

 Key: YARN-2863
 URL: https://issues.apache.org/jira/browse/YARN-2863
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.2.0
Reporter: yangping wu
   Original Estimate: 8h
  Remaining Estimate: 8h

 When I submit a job to hadoop cluster, but don't specified a queuename as 
 follow
 {code}
  $HADOOP_HOMEhadoop jar statistics.jar com.wyp.Sts 
 -Dmapreduce.job.queuename=   
  {code}
  and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
 user(default is true), then QueueManager will call createLeafQueue method to 
 create the queue, because mapreduce.job.queuename is empty .But  this  will 
 throw MetricsException
 {code}
 2014-11-14 16:07:57,358 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
 handling event type APP_ADDED to the scheduler
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root already exists!
 at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
 at 
 org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
 at 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
 at java.lang.Thread.run(Thread.java:744)
 2014-11-14 16:07:57,359 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2863) ResourceManager will shutdown when job's queuename is empty

2014-11-14 Thread yangping wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangping wu updated YARN-2863:
--
Description: 
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOME/bin/hadoop jar statistics.jar com.iteblog.Sts 
-Dmapreduce.job.queuename=   
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .But  this  will 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}

  was:
When I submit a job to hadoop cluster, but don't specified a queuename as follow
{code}
 $HADOOP_HOMEhadoop jar statistics.jar com.iteblog.Sts 
-Dmapreduce.job.queuename=   
 {code}
 and if *yarn.scheduler.fair.allow-undeclared-pools* is not overwrite by 
user(default is true), then QueueManager will call createLeafQueue method to 
create the queue, because mapreduce.job.queuename is empty .But  this  will 
throw MetricsException
{code}
2014-11-14 16:07:57,358 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type APP_ADDED to the scheduler
org.apache.hadoop.metrics2.MetricsException: Metrics source 
QueueMetrics,q0=root already exists!
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:126)
at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:107)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:217)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueueMetrics.forQueue(FSQueueMetrics.java:94)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSQueue.init(FSQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.init(FSLeafQueue.java:57)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.createLeafQueue(QueueManager.java:191)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.getLeafQueue(QueueManager.java:136)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.assignToQueue(FairScheduler.java:652)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplication(FairScheduler.java:610)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1015)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:744)
2014-11-14 16:07:57,359 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
{code}


 

[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212279#comment-14212279
 ] 

Hudson commented on YARN-2635:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1933/])
YARN-2635. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
81dc0ac6dcf2f34ad607da815ea0144f178691a9)
* hadoop-yarn-project/CHANGES.txt


 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2603) ApplicationConstants missing HADOOP_MAPRED_HOME

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212283#comment-14212283
 ] 

Hudson commented on YARN-2603:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1933/])
Revert YARN-2603. ApplicationConstants missing HADOOP_MAPRED_HOME (Ray Chiang 
via aw) (vinodkv: rev 4ae9780e6a05bfd6b93f1c871c22761ddd8b19cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationConstants missing HADOOP_MAPRED_HOME
 ---

 Key: YARN-2603
 URL: https://issues.apache.org/jira/browse/YARN-2603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Ray Chiang
  Labels: newbie
 Attachments: YARN-2603-01.patch


 The Environment enum should have HADOOP_MAPRED_HOME listed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212278#comment-14212278
 ] 

Hudson commented on YARN-2856:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1933/])
YARN-2856. Fixed RMAppImpl to handle ATTEMPT_KILLED event at ACCEPTED state on 
app recovery. Contributed by Rohith Sharmaks (jianhe: rev 
d005404ef7211fe96ce1801ed267a249568540fd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212277#comment-14212277
 ] 

Hudson commented on YARN-2846:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1933 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1933/])
YARN-2846. Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart. Contributed by 
Junping Du (jlowe: rev 33ea5ae92b9dd3abace104903d9a94d17dd75af5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java


 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2846-demo.patch, YARN-2846.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:177)

[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212292#comment-14212292
 ] 

Hudson commented on YARN-2635:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/5/])
YARN-2635. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
81dc0ac6dcf2f34ad607da815ea0144f178691a9)
* hadoop-yarn-project/CHANGES.txt


 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212300#comment-14212300
 ] 

Hudson commented on YARN-2853:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/5/])
YARN-2853. Fixed a bug in ResourceManager causing apps to hang when the user 
kill request races with ApplicationMaster finish. Contributed by Jian He. 
(vinodkv: rev 3651fe1b089851b38be351c00a9899817166bf3e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
YARN-2853. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
d648e60ebab7f1942dba92e9cd2cb62b8d70419b)
* hadoop-yarn-project/CHANGES.txt


 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212291#comment-14212291
 ] 

Hudson commented on YARN-2856:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/5/])
YARN-2856. Fixed RMAppImpl to handle ATTEMPT_KILLED event at ACCEPTED state on 
app recovery. Contributed by Rohith Sharmaks (jianhe: rev 
d005404ef7211fe96ce1801ed267a249568540fd)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212301#comment-14212301
 ] 

Hudson commented on YARN-2766:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/5/])
YARN-2766. Made ApplicationHistoryManager return a sorted list of apps, 
attempts and containers. Contributed by Robert Kanter. (zjshen: rev 
3648cb57c9f018a3a339c26f5a0ca2779485521a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.7.0

 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch, 
 YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212353#comment-14212353
 ] 

Hudson commented on YARN-2846:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1957/])
YARN-2846. Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart. Contributed by 
Junping Du (jlowe: rev 33ea5ae92b9dd3abace104903d9a94d17dd75af5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java


 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2846-demo.patch, YARN-2846.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 

[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212365#comment-14212365
 ] 

Hudson commented on YARN-2766:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1957/])
YARN-2766. Made ApplicationHistoryManager return a sorted list of apps, 
attempts and containers. Contributed by Robert Kanter. (zjshen: rev 
3648cb57c9f018a3a339c26f5a0ca2779485521a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java


  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.7.0

 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch, 
 YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212354#comment-14212354
 ] 

Hudson commented on YARN-2856:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1957/])
YARN-2856. Fixed RMAppImpl to handle ATTEMPT_KILLED event at ACCEPTED state on 
app recovery. Contributed by Rohith Sharmaks (jianhe: rev 
d005404ef7211fe96ce1801ed267a249568540fd)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java


 Application recovery throw InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
 

 Key: YARN-2856
 URL: https://issues.apache.org/jira/browse/YARN-2856
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Fix For: 2.7.0

 Attachments: YARN-2856.1.patch, YARN-2856.patch


 It is observed that recovering an application with its attempt KILLED final 
 state throw below exception. And application remain in accepted state forever.
 {code}
 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
 handle this event at current state | 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 ATTEMPT_KILLED at ACCEPTED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
   at java.lang.Thread.run(Thread.java:745)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2603) ApplicationConstants missing HADOOP_MAPRED_HOME

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212360#comment-14212360
 ] 

Hudson commented on YARN-2603:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1957/])
Revert YARN-2603. ApplicationConstants missing HADOOP_MAPRED_HOME (Ray Chiang 
via aw) (vinodkv: rev 4ae9780e6a05bfd6b93f1c871c22761ddd8b19cb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* hadoop-yarn-project/CHANGES.txt


 ApplicationConstants missing HADOOP_MAPRED_HOME
 ---

 Key: YARN-2603
 URL: https://issues.apache.org/jira/browse/YARN-2603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Ray Chiang
  Labels: newbie
 Attachments: YARN-2603-01.patch


 The Environment enum should have HADOOP_MAPRED_HOME listed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212364#comment-14212364
 ] 

Hudson commented on YARN-2853:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1957/])
YARN-2853. Fixed a bug in ResourceManager causing apps to hang when the user 
kill request races with ApplicationMaster finish. Contributed by Jian He. 
(vinodkv: rev 3651fe1b089851b38be351c00a9899817166bf3e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* hadoop-yarn-project/CHANGES.txt
YARN-2853. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
d648e60ebab7f1942dba92e9cd2cb62b8d70419b)
* hadoop-yarn-project/CHANGES.txt


 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212356#comment-14212356
 ] 

Hudson commented on YARN-2635:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1957 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1957/])
YARN-2635. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
81dc0ac6dcf2f34ad607da815ea0144f178691a9)
* hadoop-yarn-project/CHANGES.txt


 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212390#comment-14212390
 ] 

Hudson commented on YARN-2766:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/5/])
YARN-2766. Made ApplicationHistoryManager return a sorted list of apps, 
attempts and containers. Contributed by Robert Kanter. (zjshen: rev 
3648cb57c9f018a3a339c26f5a0ca2779485521a)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java


  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.7.0

 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch, 
 YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2853) Killing app may hang while AM is unregistering

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212389#comment-14212389
 ] 

Hudson commented on YARN-2853:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/5/])
YARN-2853. Fixed a bug in ResourceManager causing apps to hang when the user 
kill request races with ApplicationMaster finish. Contributed by Jian He. 
(vinodkv: rev 3651fe1b089851b38be351c00a9899817166bf3e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/TestRMAppTransitions.java
YARN-2853. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
d648e60ebab7f1942dba92e9cd2cb62b8d70419b)
* hadoop-yarn-project/CHANGES.txt


 Killing app may hang while AM is unregistering
 --

 Key: YARN-2853
 URL: https://issues.apache.org/jira/browse/YARN-2853
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2853.1.patch, YARN-2853.1.patch, YARN-2853.2.patch, 
 YARN-2853.3.patch


 When killing an app, app first moves to KILLING state, If RMAppAttempt 
 receives the attempt_unregister event before attempt_kill event,  it'll 
 ignore the later attempt_kill event.  Hence, RMApp won't be able to move to 
 KILLED state and stays at KILLING state forever.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2603) ApplicationConstants missing HADOOP_MAPRED_HOME

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212385#comment-14212385
 ] 

Hudson commented on YARN-2603:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/5/])
Revert YARN-2603. ApplicationConstants missing HADOOP_MAPRED_HOME (Ray Chiang 
via aw) (vinodkv: rev 4ae9780e6a05bfd6b93f1c871c22761ddd8b19cb)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java


 ApplicationConstants missing HADOOP_MAPRED_HOME
 ---

 Key: YARN-2603
 URL: https://issues.apache.org/jira/browse/YARN-2603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Ray Chiang
  Labels: newbie
 Attachments: YARN-2603-01.patch


 The Environment enum should have HADOOP_MAPRED_HOME listed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2635) TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212381#comment-14212381
 ] 

Hudson commented on YARN-2635:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/5/])
YARN-2635. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 
81dc0ac6dcf2f34ad607da815ea0144f178691a9)
* hadoop-yarn-project/CHANGES.txt


 TestRM, TestRMRestart, TestClientToAMTokens should run with both CS and FS
 --

 Key: YARN-2635
 URL: https://issues.apache.org/jira/browse/YARN-2635
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.6.0

 Attachments: YARN-2635-1.patch, YARN-2635-2.patch, yarn-2635-3.patch, 
 yarn-2635-4.patch


 If we change the scheduler from Capacity Scheduler to Fair Scheduler, the 
 TestRMRestart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2846) Incorrect persist exit code for running containers in reacquireContainer() that interrupted by NodeManager restart.

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212378#comment-14212378
 ] 

Hudson commented on YARN-2846:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #5 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/5/])
YARN-2846. Incorrect persist exit code for running containers in 
reacquireContainer() that interrupted by NodeManager restart. Contributed by 
Junping Du (jlowe: rev 33ea5ae92b9dd3abace104903d9a94d17dd75af5)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/RecoveredContainerLaunch.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java


 Incorrect persist exit code for running containers in reacquireContainer() 
 that interrupted by NodeManager restart.
 ---

 Key: YARN-2846
 URL: https://issues.apache.org/jira/browse/YARN-2846
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2846-demo.patch, YARN-2846.patch


 The NM restart work preserving feature could make running AM container get 
 LOST and killed during stop NM daemon. The exception is like below:
 {code}
 2014-11-11 00:48:35,214 INFO  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(408)) - Memory usage of ProcessTree 22140 for 
 container-id container_1415666714233_0001_01_84: 53.8 MB of 512 MB 
 physical memory used; 931.3 MB of 1.0 GB virtual memory used
 2014-11-11 00:48:35,223 ERROR nodemanager.NodeManager 
 (SignalLogger.java:handle(60)) - RECEIVED SIGNAL 15: SIGTERM
 2014-11-11 00:48:35,299 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50060
 2014-11-11 00:48:35,337 INFO  containermanager.ContainerManagerImpl 
 (ContainerManagerImpl.java:cleanUpApplicationsOnNMShutDown(512)) - 
 Applications still running : [application_1415666714233_0001]
 2014-11-11 00:48:35,338 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 45454
 2014-11-11 00:48:35,344 INFO  ipc.Server (Server.java:run(706)) - Stopping 
 IPC Server listener on 45454
 2014-11-11 00:48:35,346 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:serviceStop(141)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService
  waiting for pending aggregation during exit
 2014-11-11 00:48:35,347 INFO  ipc.Server (Server.java:run(832)) - Stopping 
 IPC Server Responder
 2014-11-11 00:48:35,347 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:abortLogAggregation(502)) - Aborting log 
 aggregation for application_1415666714233_0001
 2014-11-11 00:48:35,348 WARN  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:run(382)) - Aggregation did not complete for 
 application application_1415666714233_0001
 2014-11-11 00:48:35,358 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(476)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2014-11-11 00:48:35,406 ERROR launcher.RecoveredContainerLaunch 
 (RecoveredContainerLaunch.java:call(87)) - Unable to recover container 
 container_1415666714233_0001_01_01
 java.io.IOException: Interrupted while waiting for process 20001 to exit
 at 
 org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:180)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:82)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at 
 

[jira] [Created] (YARN-2864) TestRMWebServicesAppsModification fails in trunk

2014-11-14 Thread Ted Yu (JIRA)
Ted Yu created YARN-2864:


 Summary: TestRMWebServicesAppsModification fails in trunk
 Key: YARN-2864
 URL: https://issues.apache.org/jira/browse/YARN-2864
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor


From https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/5/console :
{code}
Tests run: 32, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 151.14 sec  
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
testGetNewApplicationAndSubmit[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 0.276 sec   ERROR!
java.lang.NoClassDefFoundError: org/apache/hadoop/io/FastByteComparisons
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at 
org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:187)
at 
org.apache.hadoop.io.BinaryComparable.compareTo(BinaryComparable.java:50)
at 
org.apache.hadoop.io.BinaryComparable.equals(BinaryComparable.java:72)
at org.apache.hadoop.io.Text.equals(Text.java:348)
at java.util.ArrayList.indexOf(ArrayList.java:216)
at java.util.ArrayList.contains(ArrayList.java:199)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppSubmit(TestRMWebServicesAppsModification.java:844)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testGetNewApplicationAndSubmit(TestRMWebServicesAppsModification.java:726)

testGetNewApplicationAndSubmit[3](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 0.225 sec   ERROR!
java.lang.NoClassDefFoundError: org/apache/hadoop/io/FastByteComparisons
at 
org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:187)
at 
org.apache.hadoop.io.BinaryComparable.compareTo(BinaryComparable.java:50)
at 
org.apache.hadoop.io.BinaryComparable.equals(BinaryComparable.java:72)
at org.apache.hadoop.io.Text.equals(Text.java:348)
at java.util.ArrayList.indexOf(ArrayList.java:216)
at java.util.ArrayList.contains(ArrayList.java:199)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppSubmit(TestRMWebServicesAppsModification.java:844)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testGetNewApplicationAndSubmit(TestRMWebServicesAppsModification.java:726)
{code}
Running on MacBook, I got (with Java 1.7.0_60):
{code}
Running 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
Tests run: 32, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 146.749 sec 
 FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification
testGetNewApplicationAndSubmit[1](org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification)
  Time elapsed: 0.185 sec   FAILURE!
java.lang.AssertionError: expected:Accepted but was:Internal Server Error
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testAppSubmit(TestRMWebServicesAppsModification.java:799)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification.testGetNewApplicationAndSubmit(TestRMWebServicesAppsModification.java:726)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) YARN new pluggable scheduler which does multi-resource packing

2014-11-14 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212586#comment-14212586
 ] 

Srikanth Kandula commented on YARN-2745:


Thanks Karthik, that is an interesting thought. It seems that several of the 
proposed work-items (resource estimation, expanded asks, modifications to task 
matching on NM hearbeat) have to happen regardless of whether this is a new 
scheduler or a flag atop existing ones like FairScheduler. Do you foresee any 
additional complications to build this as a flag as opposed to stand-alone? 
Will take this offline.

 YARN new pluggable scheduler which does multi-resource packing
 --

 Key: YARN-2745
 URL: https://issues.apache.org/jira/browse/YARN-2745
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager, scheduler
Reporter: Robert Grandl
 Attachments: sigcomm_14_tetris_talk.pptx, tetris_paper.pdf


 In this umbrella JIRA we propose a new pluggable scheduler, which accounts 
 for all resources used by a task (CPU, memory, disk, network) and it is able 
 to achieve three competing objectives: fairness, improve cluster utilization 
 and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2165) Timelineserver should validate that yarn.timeline-service.ttl-ms is greater than zero

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212595#comment-14212595
 ] 

Zhijie Shen commented on YARN-2165:
---

bq. should the check be (= 0) instead of ( 0) ? Since 0 ttl and ttlinterval 
have no real meanings.

Agree.

To be more general, it's better to do the sanity check for all the numeric 
configurations while initializing the timeline server, making sure a valid 
number has been set. Here's the current list.

{code}
  property
descriptionTime to live for timeline store data in 
milliseconds./description
nameyarn.timeline-service.ttl-ms/name
value60480/value
  /property

  property
descriptionLength of time to wait between deletion cycles of leveldb 
timeline store in milliseconds./description
nameyarn.timeline-service.leveldb-timeline-store.ttl-interval-ms/name
value30/value
  /property

  property
descriptionSize of read cache for uncompressed blocks for leveldb 
timeline store in bytes./description
nameyarn.timeline-service.leveldb-timeline-store.read-cache-size/name
value104857600/value
  /property

  property
descriptionSize of cache for recently read entity start times for leveldb 
timeline store in number of entities./description

nameyarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size/name
value1/value
  /property

  property
descriptionSize of cache for recently written entity start times for 
leveldb timeline store in number of entities./description

nameyarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size/name
value1/value
  /property

  property
descriptionHandler thread count to serve the client RPC 
requests./description
nameyarn.timeline-service.handler-thread-count/name
value10/value
  /property

  property
description
Default maximum number of retires for timeline servive client.
/description
nameyarn.timeline-service.client.max-retries/name
value30/value
  /property

  property
description
Default retry time interval for timeline servive client.
/description
nameyarn.timeline-service.client.retry-interval-ms/name
value1000/value
  /property
{code}

 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 -

 Key: YARN-2165
 URL: https://issues.apache.org/jira/browse/YARN-2165
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Karam Singh
 Attachments: YARN-2165.patch


 Timelineserver should validate that yarn.timeline-service.ttl-ms is greater 
 than zero
 Currently if set yarn.timeline-service.ttl-ms=0 
 Or yarn.timeline-service.ttl-ms=-86400 
 Timeline server start successfully with complaining
 {code}
 2014-06-15 14:52:16,562 INFO  timeline.LeveldbTimelineStore 
 (LeveldbTimelineStore.java:init(247)) - Starting deletion thread with ttl 
 -60480 and cycle interval 30
 {code}
 At starting timelinserver should that yarn.timeline-service-ttl-ms  0
 otherwise specially for -ive value discard oldvalues timestamp will be set 
 future value. Which may lead to inconsistancy in behavior 
 {code}
 public void run() {
   while (true) {
 long timestamp = System.currentTimeMillis() - ttl;
 try {
   discardOldEntities(timestamp);
   Thread.sleep(ttlInterval);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2166) Timelineserver should validate that yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than zero when level db is for timeline store

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212600#comment-14212600
 ] 

Zhijie Shen commented on YARN-2166:
---

See the comments on 
[YARN-2165|https://issues.apache.org/jira/browse/YARN-2165?focusedCommentId=14212595page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14212595].
 How about having one pass to do sanity check for all numeric configs.

 Timelineserver should validate that 
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
 zero when level db is for timeline store
 -

 Key: YARN-2166
 URL: https://issues.apache.org/jira/browse/YARN-2166
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Karam Singh

 Timelineserver should validate that 
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms is greater than 
 zero when level db is for timeline store
 other if we start timelineserver with
 yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms=-5000
 Timeline starts but Thread.sleep call in EntityDeletionThread.run keep on 
 throwing UncaughtException -ive value
 {code}
 2014-06-16 10:22:03,537 ERROR yarn.YarnUncaughtExceptionHandler 
 (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread 
 Thread[Thread-4,5,main] threw an Exception.
 java.lang.IllegalArgumentException: timeout value is negative
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore$EntityDeletionThread.run(LeveldbTimelineStore.java:257)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2811:
--
Attachment: YARN-2811.v8.patch

 Fair Scheduler is violating max memory settings in 2.4
 --

 Key: YARN-2811
 URL: https://issues.apache.org/jira/browse/YARN-2811
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
 YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, 
 YARN-2811.v6.patch, YARN-2811.v7.patch, YARN-2811.v8.patch


 This has been seen on several queues showing the allocated MB going 
 significantly above the max MB and it appears to have started with the 2.4 
 upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-14 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212640#comment-14212640
 ] 

Ming Ma commented on YARN-2862:
---

Here are some possible ways to fix it.

1) Fix RMAppManager's recoverApplication to ignore any unrecoverable app.
2) Fix RawLocalFileSystem used by FileSystemRMStateStore to force sync data to 
disk device.
3) Fix FileSystemRMStateStore to skip app with null ApplicationState#context.

Sounds like #3 is the best given the usage scenario of FileSystemRMStateStore. 
Also RM should expect each implementation of RMStateStore#loadState load valid 
ApplicationState into RMState.

Thoughts?

 RM might not start if the machine was hard shutdown and 
 FileSystemRMStateStore was used
 ---

 Key: YARN-2862
 URL: https://issues.apache.org/jira/browse/YARN-2862
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma

 This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
 scenario, it might not be that important, unless there is something we need 
 to fix at RM layer to make it more tolerant to RMStore issue.
 When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
 of the stored application data end up with size zero after reboot. And RM 
 didn't like that.
 {noformat}
 ls -al 
 /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
 total 156
 drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
 drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 appattempt_1412702189634_324351_01
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 .appattempt_1412702189634_324351_01.crc
 -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
 -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
 {noformat}
 When RM starts up
 {noformat}
 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
 opening checksum file: 
 file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
   Ignoring exception:
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
 ...
 2014-11-13 17:40:48,876 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
 load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-14 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212671#comment-14212671
 ] 

Gera Shegalov commented on YARN-2862:
-

[~mingma], It's potentially already fixed by YARN-2010. We can try it for our 
scenario.

 RM might not start if the machine was hard shutdown and 
 FileSystemRMStateStore was used
 ---

 Key: YARN-2862
 URL: https://issues.apache.org/jira/browse/YARN-2862
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma

 This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
 scenario, it might not be that important, unless there is something we need 
 to fix at RM layer to make it more tolerant to RMStore issue.
 When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
 of the stored application data end up with size zero after reboot. And RM 
 didn't like that.
 {noformat}
 ls -al 
 /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
 total 156
 drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
 drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 appattempt_1412702189634_324351_01
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 .appattempt_1412702189634_324351_01.crc
 -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
 -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
 {noformat}
 When RM starts up
 {noformat}
 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
 opening checksum file: 
 file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
   Ignoring exception:
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
 ...
 2014-11-13 17:40:48,876 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
 load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-11-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212672#comment-14212672
 ] 

Karthik Kambatla commented on YARN-2604:


+1 to what Jason said. Reusing the configs introduced in YARN-2001 sounds the 
right way to me too. 

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter
 Attachments: YARN-2604.patch, YARN-2604.patch, YARN-2604.patch


 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212769#comment-14212769
 ] 

Hadoop QA commented on YARN-2811:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681587/YARN-2811.v8.patch
  against trunk revision 1a1dcce.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5844//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5844//console

This message is automatically generated.

 Fair Scheduler is violating max memory settings in 2.4
 --

 Key: YARN-2811
 URL: https://issues.apache.org/jira/browse/YARN-2811
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
 YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, 
 YARN-2811.v6.patch, YARN-2811.v7.patch, YARN-2811.v8.patch


 This has been seen on several queues showing the allocated MB going 
 significantly above the max MB and it appears to have started with the 2.4 
 upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-11-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned YARN-1530:
---

Assignee: Mit Desai

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Mit Desai
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2014-11-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-1530:

Assignee: (was: Mit Desai)

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai reassigned YARN-2375:
---

Assignee: Mit Desai

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-11-14 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212775#comment-14212775
 ] 

Rohith commented on YARN-2588:
--

There is another hidden similar issue  after this patch. Should I raise another 
Jira or provide add on patch to this Jira only?

 Standby RM does not transitionToActive if previous transitionToActive is 
 failed with ZK exception.
 --

 Key: YARN-2588
 URL: https://issues.apache.org/jira/browse/YARN-2588
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.6.0, 2.5.1
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.6.0

 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch


 Consider scenario where, StandBy RM is failed to transition to Active because 
 of ZK exception(connectionLoss or SessionExpired). Then any further 
 transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-11-14 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212777#comment-14212777
 ] 

Karthik Kambatla commented on YARN-2588:


Let us do it on another JIRA, given this is already committed to 2.6.0.

 Standby RM does not transitionToActive if previous transitionToActive is 
 failed with ZK exception.
 --

 Key: YARN-2588
 URL: https://issues.apache.org/jira/browse/YARN-2588
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.6.0, 2.5.1
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.6.0

 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch


 Consider scenario where, StandBy RM is failed to transition to Active because 
 of ZK exception(connectionLoss or SessionExpired). Then any further 
 transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2857) ConcurrentModificationException in ContainerLogAppender

2014-11-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212792#comment-14212792
 ] 

Jason Lowe commented on YARN-2857:
--

+1 lgtm.  Committing this.

 ConcurrentModificationException in ContainerLogAppender
 ---

 Key: YARN-2857
 URL: https://issues.apache.org/jira/browse/YARN-2857
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
Priority: Critical
 Attachments: ContainerLogAppender.java, MAPREDUCE-6139-test.01.patch, 
 MAPREDUCE-6139.1.patch, MAPREDUCE-6139.2.patch, MAPREDUCE-6139.3.patch, 
 YARN-2857.3.patch


 Context:
 * Hadoop-2.3.0
 * Using Oozie 4.0.1
 * Pig version 0.11.x
 The job is submitted by Oozie to launch Pig script.
 The following exception traces were found on MR task log:
 In syslog:
 {noformat}
 2014-10-24 20:37:29,317 WARN [Thread-5] 
 org.apache.hadoop.util.ShutdownHookManager: ShutdownHook '' failed, 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
   at java.util.LinkedList$ListItr.next(LinkedList.java:888)
   at 
 org.apache.hadoop.yarn.ContainerLogAppender.close(ContainerLogAppender.java:94)
   at 
 org.apache.log4j.helpers.AppenderAttachableImpl.removeAllAppenders(AppenderAttachableImpl.java:141)
   at org.apache.log4j.Category.removeAllAppenders(Category.java:891)
   at org.apache.log4j.Hierarchy.shutdown(Hierarchy.java:471)
   at org.apache.log4j.LogManager.shutdown(LogManager.java:267)
   at org.apache.hadoop.mapred.TaskLog.syncLogsShutdown(TaskLog.java:286)
   at org.apache.hadoop.mapred.TaskLog$2.run(TaskLog.java:339)
   at 
 org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
 2014-10-24 20:37:29,395 INFO [main] 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics 
 system...
 {noformat}
 in stderr:
 {noformat}
 java.util.ConcurrentModificationException
   at 
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
   at java.util.LinkedList$ListItr.next(LinkedList.java:888)
   at 
 org.apache.hadoop.yarn.ContainerLogAppender.close(ContainerLogAppender.java:94)
   at 
 org.apache.log4j.helpers.AppenderAttachableImpl.removeAllAppenders(AppenderAttachableImpl.java:141)
   at org.apache.log4j.Category.removeAllAppenders(Category.java:891)
   at 
 org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:759)
   at 
 org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
   at 
 org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
   at 
 org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:440)
   at org.apache.pig.Main.configureLog4J(Main.java:740)
   at org.apache.pig.Main.run(Main.java:384)
   at org.apache.pig.PigRunner.run(PigRunner.java:49)
   at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
   at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:223)
   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
   at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:483)
   at 
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2857) ConcurrentModificationException in ContainerLogAppender

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212810#comment-14212810
 ] 

Hudson commented on YARN-2857:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6545 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6545/])
YARN-2857. ConcurrentModificationException in ContainerLogAppender. Contributed 
by Mohammad Kamrul Islam (jlowe: rev f2fe8a800e5b0f3875931adba9ae20c6a95aa4ff)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ContainerLogAppender.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/TestContainerLogAppender.java


 ConcurrentModificationException in ContainerLogAppender
 ---

 Key: YARN-2857
 URL: https://issues.apache.org/jira/browse/YARN-2857
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
Priority: Critical
 Fix For: 2.7.0

 Attachments: ContainerLogAppender.java, MAPREDUCE-6139-test.01.patch, 
 MAPREDUCE-6139.1.patch, MAPREDUCE-6139.2.patch, MAPREDUCE-6139.3.patch, 
 YARN-2857.3.patch


 Context:
 * Hadoop-2.3.0
 * Using Oozie 4.0.1
 * Pig version 0.11.x
 The job is submitted by Oozie to launch Pig script.
 The following exception traces were found on MR task log:
 In syslog:
 {noformat}
 2014-10-24 20:37:29,317 WARN [Thread-5] 
 org.apache.hadoop.util.ShutdownHookManager: ShutdownHook '' failed, 
 java.util.ConcurrentModificationException
 java.util.ConcurrentModificationException
   at 
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
   at java.util.LinkedList$ListItr.next(LinkedList.java:888)
   at 
 org.apache.hadoop.yarn.ContainerLogAppender.close(ContainerLogAppender.java:94)
   at 
 org.apache.log4j.helpers.AppenderAttachableImpl.removeAllAppenders(AppenderAttachableImpl.java:141)
   at org.apache.log4j.Category.removeAllAppenders(Category.java:891)
   at org.apache.log4j.Hierarchy.shutdown(Hierarchy.java:471)
   at org.apache.log4j.LogManager.shutdown(LogManager.java:267)
   at org.apache.hadoop.mapred.TaskLog.syncLogsShutdown(TaskLog.java:286)
   at org.apache.hadoop.mapred.TaskLog$2.run(TaskLog.java:339)
   at 
 org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
 2014-10-24 20:37:29,395 INFO [main] 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics 
 system...
 {noformat}
 in stderr:
 {noformat}
 java.util.ConcurrentModificationException
   at 
 java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:966)
   at java.util.LinkedList$ListItr.next(LinkedList.java:888)
   at 
 org.apache.hadoop.yarn.ContainerLogAppender.close(ContainerLogAppender.java:94)
   at 
 org.apache.log4j.helpers.AppenderAttachableImpl.removeAllAppenders(AppenderAttachableImpl.java:141)
   at org.apache.log4j.Category.removeAllAppenders(Category.java:891)
   at 
 org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:759)
   at 
 org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)
   at 
 org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)
   at 
 org.apache.log4j.PropertyConfigurator.configure(PropertyConfigurator.java:440)
   at org.apache.pig.Main.configureLog4J(Main.java:740)
   at org.apache.pig.Main.run(Main.java:384)
   at org.apache.pig.PigRunner.run(PigRunner.java:49)
   at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:283)
   at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:223)
   at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
   at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:483)
   at 
 org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at 

[jira] [Updated] (YARN-2056) Disable preemption at Queue level

2014-11-14 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2056:
-
Attachment: YARN-2056.201411142002.txt

[~leftnoteasy], Thank you for all of your help. Uploading new patch.

bq. Instead of multiply you should use multiplyAndNormalizeUp here.
Using {{multiplyAndNormalizeUp}} helps. However, for the use case in 
{{testHierarchicalLarge}}, the rounding is still different with the new 
algorithm (7 and 5 instead of 9 and 4).

bq. Actually I think we should consider minimum_allocation in preemption 
policy, we can address that in a separated JIRA. 
Would you please create a new JIRA and elaborate on this further?
{quote}
bq. {{testDisablePreemptionOverCapPlusPending}}
Since the result is not changed before/after we set preemption queue, I think 
it is unnecessary, I would suggest to take it out.
{quote}
I removed this test.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
 YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
 YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, 
 YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, 
 YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, 
 YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, 
 YARN-2056.201411142002.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate

2014-11-14 Thread Rohith (JIRA)
Rohith created YARN-2865:


 Summary: Application recovery continuously fails with Application 
with id already present. Cannot duplicate
 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith


YARN-2588 handles exception thrown while transitioningToActive and reset 
activeServices. But it misses out clearing RMcontext apps/nodes details and 
ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-11-14 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212835#comment-14212835
 ] 

Rohith commented on YARN-2588:
--

Thanks Karthik!! I have raised YARN-2865

 Standby RM does not transitionToActive if previous transitionToActive is 
 failed with ZK exception.
 --

 Key: YARN-2588
 URL: https://issues.apache.org/jira/browse/YARN-2588
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.6.0, 2.5.1
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.6.0

 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch


 Consider scenario where, StandBy RM is failed to transition to Active because 
 of ZK exception(connectionLoss or SessionExpired). Then any further 
 transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2816) NM fail to start with NPE during container recovery

2014-11-14 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212838#comment-14212838
 ] 

Jason Lowe commented on YARN-2816:
--

+1 lgtm.  Committing this.

 NM fail to start with NPE during container recovery
 ---

 Key: YARN-2816
 URL: https://issues.apache.org/jira/browse/YARN-2816
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2816.000.patch, YARN-2816.001.patch, 
 YARN-2816.002.patch, leveldb_records.txt


 NM fail to start with NPE during container recovery.
 We saw the following crash happen:
 2014-10-30 22:22:37,211 INFO org.apache.hadoop.service.AbstractService: 
 Service 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
  failed in state INITED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
 The reason is some DB files used in NMLeveldbStateStoreService are 
 accidentally deleted to save disk space at 
 /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. This leaves some incomplete 
 container record which don't have CONTAINER_REQUEST_KEY_SUFFIX(startRequest) 
 entry in the DB. When container is recovered at 
 ContainerManagerImpl#recoverContainer, 
 The NullPointerException at the following code cause NM shutdown.
 {code}
 StartContainerRequest req = rcs.getStartRequest();
 ContainerLaunchContext launchContext = req.getContainerLaunchContext();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212836#comment-14212836
 ] 

Jian He commented on YARN-2862:
---

YARN-2010 may not solve this. YARN-1185 might have fixed this.

 RM might not start if the machine was hard shutdown and 
 FileSystemRMStateStore was used
 ---

 Key: YARN-2862
 URL: https://issues.apache.org/jira/browse/YARN-2862
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma

 This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
 scenario, it might not be that important, unless there is something we need 
 to fix at RM layer to make it more tolerant to RMStore issue.
 When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
 of the stored application data end up with size zero after reboot. And RM 
 didn't like that.
 {noformat}
 ls -al 
 /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
 total 156
 drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
 drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 appattempt_1412702189634_324351_01
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 .appattempt_1412702189634_324351_01.crc
 -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
 -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
 {noformat}
 When RM starts up
 {noformat}
 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
 opening checksum file: 
 file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
   Ignoring exception:
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
 ...
 2014-11-13 17:40:48,876 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
 load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate

2014-11-14 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212842#comment-14212842
 ] 

Rohith commented on YARN-2865:
--

I encountered this scenario in my test cluster strange way!! causing below 
exception.
{code}
2014-11-14 04:11:33,433 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Recovering 2 
applications
2014-11-14 04:11:33,433 INFO 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Default priority 
level is set to application:application_1415591025732_0001
2014-11-14 04:11:33,433 WARN 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager: Application with id 
application_1415591025732_0001 is already present! Cannot add a duplicate!
2014-11-14 04:11:33,433 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
load/recover state
org.apache.hadoop.yarn.exceptions.YarnException: Application with id 
application_1415591025732_0001 is already present! Cannot add a duplicate!
at 
org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:364)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:332)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1146)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:521)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:925)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:966)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:962)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1612)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:962)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:281)
at 
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:602)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
{code}

 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith

 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate

2014-11-14 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2865:
-
Attachment: YARN-2865.patch

 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate

2014-11-14 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212845#comment-14212845
 ] 

Rohith commented on YARN-2865:
--

Attaching the patch that clears rmcontext,cluster metric and queue metric. And 
also I have done refactoring of common methods were called from 
transitionToActive and transitionToStandBy.
Please review the patch.


 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2816) NM fail to start with NPE during container recovery

2014-11-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212860#comment-14212860
 ] 

Hudson commented on YARN-2816:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6549 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6549/])
YARN-2816. NM fail to start with NPE during container recovery. Contributed by 
Zhihai Xu (jlowe: rev 49c38898b0be64fc686d039ed2fb2dea1378df02)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java


 NM fail to start with NPE during container recovery
 ---

 Key: YARN-2816
 URL: https://issues.apache.org/jira/browse/YARN-2816
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 2.7.0

 Attachments: YARN-2816.000.patch, YARN-2816.001.patch, 
 YARN-2816.002.patch, leveldb_records.txt


 NM fail to start with NPE during container recovery.
 We saw the following crash happen:
 2014-10-30 22:22:37,211 INFO org.apache.hadoop.service.AbstractService: 
 Service 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
  failed in state INITED; cause: java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
 The reason is some DB files used in NMLeveldbStateStoreService are 
 accidentally deleted to save disk space at 
 /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. This leaves some incomplete 
 container record which don't have CONTAINER_REQUEST_KEY_SUFFIX(startRequest) 
 entry in the DB. When container is recovered at 
 ContainerManagerImpl#recoverContainer, 
 The NullPointerException at the following code cause NM shutdown.
 {code}
 StartContainerRequest req = rcs.getStartRequest();
 ContainerLaunchContext launchContext = req.getContainerLaunchContext();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-2375:
--
Description: 
This JIRA is to remove the ats enabled flag check within the 
TimelineClientImpl. Example where this fails is below.
While running secure timeline server with ats flag set to disabled on resource 
manager, Timeline delegation token renewer throws an NPE. 

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-14 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2811:
--
Attachment: YARN-2811.v9.patch

 Fair Scheduler is violating max memory settings in 2.4
 --

 Key: YARN-2811
 URL: https://issues.apache.org/jira/browse/YARN-2811
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
 YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, 
 YARN-2811.v6.patch, YARN-2811.v7.patch, YARN-2811.v8.patch, YARN-2811.v9.patch


 This has been seen on several queues showing the allocated MB going 
 significantly above the max MB and it appears to have started with the 2.4 
 upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2866) Capacity scheduler preemption policy should respect yarn.scheduler.minimum-allocation-mb when computing resource of queues

2014-11-14 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2866:


 Summary: Capacity scheduler preemption policy should respect 
yarn.scheduler.minimum-allocation-mb when computing resource of queues
 Key: YARN-2866
 URL: https://issues.apache.org/jira/browse/YARN-2866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan


Currently, capacity scheduler preemption logic doesn't respect 
minimum_allocation when computing ideal_assign/guaranteed_resource, etc. We 
should respect it to avoid some potential rounding issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-11-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212899#comment-14212899
 ] 

Wangda Tan commented on YARN-2056:
--

[~eepayne],
Thanks for update, 

bq. Would you please create a new JIRA and elaborate on this further?
Created YARN-2866 to track this issue.

The latest patch LGTM, +1. 
Would you like to take a look, [~vinodkv], [~mayank_bansal]?

Wangda

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
 YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
 YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, 
 YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, 
 YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, 
 YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, 
 YARN-2056.201411142002.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate

2014-11-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2865:
--
Priority: Critical  (was: Major)
Target Version/s: 2.7.0

 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2014-11-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2414:
--
Target Version/s: 2.7.0

 RM web UI: app page will crash if app is failed before any attempt has been 
 created
 ---

 Key: YARN-2414
 URL: https://issues.apache.org/jira/browse/YARN-2414
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Wangda Tan
 Attachments: YARN-2414.patch


 {code}
 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: /cluster/app/application_1407887030038_0001
 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
   at org.mortbay.jetty.Server.handle(Server.java:326)
   at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
   at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
   at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
   at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 Caused by: java.lang.NullPointerException
   at 
 

[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212928#comment-14212928
 ] 

Zhijie Shen commented on YARN-2375:
---

bq. While running secure timeline server with ats flag set to disabled on 
resource manager, Timeline delegation token renewer throws an NPE.

This is a bug. DT related API methods doesn't check if isEnabled == true. On 
the other side, the internal stuff is only inited when isEnabled == true. This 
is why NPE happens. Will file a separate Jira for it.

As to removing the global flag, I'm not sure if we should do that. Nowadays, we 
still don't assume the timeline server is always up as other components in a 
YARN cluster: RM and NM. Then, if the timeline server is not setup but the YARN 
cluster assumes it is up, it will result in problems. For example, app 
submission fails at getting the timeline DT in a secure cluster.

Therefore, this config should be kept to serve as the flag to indicate if we 
have setup the timeline server for the YARN cluster, until we promote it the be 
the always on daemon like RM and NM. Thoughts?

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2404) Remove ApplicationAttemptState and ApplicationState class in RMStateStore class

2014-11-14 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2404:
--
Target Version/s: 2.7.0

 Remove ApplicationAttemptState and ApplicationState class in RMStateStore 
 class 
 

 Key: YARN-2404
 URL: https://issues.apache.org/jira/browse/YARN-2404
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2404.1.patch, YARN-2404.2.patch, YARN-2404.3.patch, 
 YARN-2404.4.patch


 We can remove ApplicationState and ApplicationAttemptState class in 
 RMStateStore, given that we already have ApplicationStateData and 
 ApplicationAttemptStateData records. we may just replace ApplicationState 
 with ApplicationStateData, similarly for ApplicationAttemptState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not

2014-11-14 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2867:
-

 Summary: TimelineClient DT methods should check if the timeline 
service is enabled or not
 Key: YARN-2867
 URL: https://issues.apache.org/jira/browse/YARN-2867
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Zhijie Shen


DT related methods doesn't check if isEnabled == true. On the other side, the 
internal stuff is only inited when isEnabled == true. NPE happens if users call 
these methods when the timeline service config is not set to enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212932#comment-14212932
 ] 

Zhijie Shen commented on YARN-2375:
---

Filed YARN-2867

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2403) TestNodeManagerResync fails occasionally in trunk

2014-11-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212934#comment-14212934
 ] 

Jian He commented on YARN-2403:
---

is this still happening? otherwise we can close this. if it's still happening, 
the patch maybe not enough. 

 TestNodeManagerResync fails occasionally in trunk
 -

 Key: YARN-2403
 URL: https://issues.apache.org/jira/browse/YARN-2403
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: YARN-2403.patch


 From  https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ :
 {code}
   
 TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146
  expected:2 but was:1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-14 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212935#comment-14212935
 ] 

Gera Shegalov commented on YARN-2862:
-

[~jianhe], to add more details: we use 2.4+patches, YARN-1185 is in 2.3.

 RM might not start if the machine was hard shutdown and 
 FileSystemRMStateStore was used
 ---

 Key: YARN-2862
 URL: https://issues.apache.org/jira/browse/YARN-2862
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma

 This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
 scenario, it might not be that important, unless there is something we need 
 to fix at RM layer to make it more tolerant to RMStore issue.
 When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
 of the stored application data end up with size zero after reboot. And RM 
 didn't like that.
 {noformat}
 ls -al 
 /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
 total 156
 drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
 drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 appattempt_1412702189634_324351_01
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 .appattempt_1412702189634_324351_01.crc
 -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
 -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
 {noformat}
 When RM starts up
 {noformat}
 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
 opening checksum file: 
 file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
   Ignoring exception:
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
 ...
 2014-11-13 17:40:48,876 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
 load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2868) Add metric for initial container launch time

2014-11-14 Thread Ray Chiang (JIRA)
Ray Chiang created YARN-2868:


 Summary: Add metric for initial container launch time
 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang


Add a metric to measure the latency between starting container allocation and 
first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2603) ApplicationConstants missing HADOOP_MAPRED_HOME

2014-11-14 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-2603.
---
Resolution: Invalid

Thanks for the response Ray. Closing this as invalid. Repeating my previous 
message in case of more discussion:
{quote}
This is not correct. We deliberately avoided putting compile time references to 
MapReduce in all of YARN.

You should instead use yarn.nodemanager.env-whitelist and set 
HADOOP_MAPRED_HOME while starting nodemanager.

OTOH, we are moving away from cluster installs of MapReduce to instead use 
DistributedCache: See MAPREDUCE-4421.
{quote}

 ApplicationConstants missing HADOOP_MAPRED_HOME
 ---

 Key: YARN-2603
 URL: https://issues.apache.org/jira/browse/YARN-2603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Allen Wittenauer
Assignee: Ray Chiang
  Labels: newbie
 Attachments: YARN-2603-01.patch


 The Environment enum should have HADOOP_MAPRED_HOME listed as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time

2014-11-14 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: 20141114_FSQueueAllocationMetric-Up-04.patch

First attempt at implementation.

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability

 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time

2014-11-14 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: (was: 20141114_FSQueueAllocationMetric-Up-04.patch)

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability

 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2868) Add metric for initial container launch time

2014-11-14 Thread Ray Chiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated YARN-2868:
-
Attachment: YARN-2868-01.patch

First attempt at implementation.  Second upload.

 Add metric for initial container launch time
 

 Key: YARN-2868
 URL: https://issues.apache.org/jira/browse/YARN-2868
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Ray Chiang
Assignee: Ray Chiang
  Labels: metrics, supportability
 Attachments: YARN-2868-01.patch


 Add a metric to measure the latency between starting container allocation 
 and first container actually allocated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212944#comment-14212944
 ] 

Hadoop QA commented on YARN-2056:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12681612/YARN-2056.201411142002.txt
  against trunk revision 10c98ae.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5845//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5845//console

This message is automatically generated.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
 YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
 YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, 
 YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, 
 YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, 
 YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, 
 YARN-2056.201411142002.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212947#comment-14212947
 ] 

Mit Desai commented on YARN-2375:
-

bq. DT related API methods doesn't check if isEnabled == true
If the timeline server is running, we cannot turn on the flag in yarn-site 
because if the flag is turned on, all mapreduce applications will automatically 
try to connect to timeline server and that is not something that we want at 
this time.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2811) Fair Scheduler is violating max memory settings in 2.4

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212949#comment-14212949
 ] 

Hadoop QA commented on YARN-2811:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681619/YARN-2811.v9.patch
  against trunk revision 49c3889.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5847//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5847//console

This message is automatically generated.

 Fair Scheduler is violating max memory settings in 2.4
 --

 Key: YARN-2811
 URL: https://issues.apache.org/jira/browse/YARN-2811
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Siqi Li
Assignee: Siqi Li
 Attachments: YARN-2811.v1.patch, YARN-2811.v2.patch, 
 YARN-2811.v3.patch, YARN-2811.v4.patch, YARN-2811.v5.patch, 
 YARN-2811.v6.patch, YARN-2811.v7.patch, YARN-2811.v8.patch, YARN-2811.v9.patch


 This has been seen on several queues showing the allocated MB going 
 significantly above the max MB and it appears to have started with the 2.4 
 upgrade. It could be a regression bug from 2.0 to 2.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212951#comment-14212951
 ] 

Mit Desai commented on YARN-2375:
-

And if the flag is turned off in yarn-site, DT API will make its way till that 
condition validation and do nothing.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2396) RpcClientFactoryPBImpl.stopClient always throws due to missing close method

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212952#comment-14212952
 ] 

Hadoop QA commented on YARN-2396:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660649/yarn2396.patch
  against trunk revision 49c3889.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5849//console

This message is automatically generated.

 RpcClientFactoryPBImpl.stopClient always throws due to missing close method
 ---

 Key: YARN-2396
 URL: https://issues.apache.org/jira/browse/YARN-2396
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.4.1
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2396.patch


 RpcClientFactoryPBImpl.stopClient will throw a YarnRuntimeException if the 
 protocol does not have a close method, despite the log message indicating it 
 is ignoring errors.  It's interesting to note that none of the YARN protocol 
 classes currently have a close method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2403) TestNodeManagerResync fails occasionally in trunk

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212961#comment-14212961
 ] 

Hadoop QA commented on YARN-2403:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660964/YARN-2403.patch
  against trunk revision 49c3889.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5848//console

This message is automatically generated.

 TestNodeManagerResync fails occasionally in trunk
 -

 Key: YARN-2403
 URL: https://issues.apache.org/jira/browse/YARN-2403
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor
 Attachments: YARN-2403.patch


 From  https://builds.apache.org/job/Hadoop-Yarn-trunk/640/ :
 {code}
   
 TestNodeManagerResync.testKillContainersOnResync:112-testContainerPreservationOnResyncImpl:146
  expected:2 but was:1
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2375) Allow enabling/disabling timeline server per framework

2014-11-14 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212964#comment-14212964
 ] 

Jonathan Eagles commented on YARN-2375:
---

[~zjshen], you misunderstand my request. I am proposing to retain the flag. 
However, the responsibility of checking whether the ats is enabled needs to be 
outside of the TimelineClientImpl. In fact, the code in yarn assumes the design 
I am proposing. In YarnClient it checks the value of ats.enabled, then it 
creates the TimelineClientImpl which then re-checks ats.enabled. This is the 
preferred object design.

The issues lies in the fact the the timeline delegation token renewer creates a 
TimelineClient because it has a timeline server delegation token. This is proof 
enough that a timelineclient needs to be created. This goes back to my original 
design constraint that ats.enabled must be able to be turned off globally, and 
enabled at the per job/framework level.

 Allow enabling/disabling timeline server per framework
 --

 Key: YARN-2375
 URL: https://issues.apache.org/jira/browse/YARN-2375
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Eagles
Assignee: Mit Desai

 This JIRA is to remove the ats enabled flag check within the 
 TimelineClientImpl. Example where this fails is below.
 While running secure timeline server with ats flag set to disabled on 
 resource manager, Timeline delegation token renewer throws an NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2014-11-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212970#comment-14212970
 ] 

Jian He commented on YARN-2392:
---

thanks Steve, patch not applying anymore,  mind updating the patch? 

 add more diags about app retry limits on AM failures
 

 Key: YARN-2392
 URL: https://issues.apache.org/jira/browse/YARN-2392
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
 Attachments: YARN-2392-001.patch, YARN-2392-002.patch


 # when an app fails the failure count is shown, but not what the global + 
 local limits are. If the two are different, they should both be printed. 
 # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2432) RMStateStore should process the pending events before close

2014-11-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212974#comment-14212974
 ] 

Jian He commented on YARN-2432:
---

looks good, kick jenkins manually

 RMStateStore should process the pending events before close
 ---

 Key: YARN-2432
 URL: https://issues.apache.org/jira/browse/YARN-2432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2432.patch


 Refer to discussion on YARN-2136 
 (https://issues.apache.org/jira/browse/YARN-2136?focusedCommentId=14097266page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14097266).
  
 As pointed out by [~jianhe], we should process the dispatcher event queue 
 before closing the state store by flipping over the following statements in 
 code.
 {code:title=RMStateStore.java|borderStyle=solid}
  protected void serviceStop() throws Exception {
 closeInternal();
 dispatcher.stop();
   }
 {code}
 Currently, if the state store is being stopped on events such as switching to 
 standby, it will first close the state store(in case of ZKRMStateStore, close 
 connection with ZK) and then process the pending events. Instead, we should 
 first process the pending events and then call close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2432) RMStateStore should process the pending events before close

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212984#comment-14212984
 ] 

Hadoop QA commented on YARN-2432:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12663208/YARN-2432.patch
  against trunk revision 49c3889.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5851//console

This message is automatically generated.

 RMStateStore should process the pending events before close
 ---

 Key: YARN-2432
 URL: https://issues.apache.org/jira/browse/YARN-2432
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-2432.patch


 Refer to discussion on YARN-2136 
 (https://issues.apache.org/jira/browse/YARN-2136?focusedCommentId=14097266page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14097266).
  
 As pointed out by [~jianhe], we should process the dispatcher event queue 
 before closing the state store by flipping over the following statements in 
 code.
 {code:title=RMStateStore.java|borderStyle=solid}
  protected void serviceStop() throws Exception {
 closeInternal();
 dispatcher.stop();
   }
 {code}
 Currently, if the state store is being stopped on events such as switching to 
 standby, it will first close the state store(in case of ZKRMStateStore, close 
 connection with ZK) and then process the pending events. Instead, we should 
 first process the pending events and then call close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2862) RM might not start if the machine was hard shutdown and FileSystemRMStateStore was used

2014-11-14 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212989#comment-14212989
 ] 

Zhijie Shen commented on YARN-2862:
---

It is likely that the assumption we made in 
[YARN-1776|https://issues.apache.org/jira/browse/YARN-1776?focusedCommentId=13942201page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13942201]
 is not fully correct.

When updating a state file, we (1) write the new file to .new, (2) delete the 
existing one, and (3) rename the .new to the existing file name. If crash 
happens before (2), we use .new to recover the state file when loading the 
state (see FileSystemRMStateStore#checkAndResumeUpdateOperation).

According to the description here, RM can crash when (1) is in progress, and 
leave a corrupted .new file. It seems that we have to do additional validation 
to check if .new file is corrupted or not, or just simply ignore it .

 RM might not start if the machine was hard shutdown and 
 FileSystemRMStateStore was used
 ---

 Key: YARN-2862
 URL: https://issues.apache.org/jira/browse/YARN-2862
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma

 This might be a known issue. Given FileSystemRMStateStore isn't used for HA 
 scenario, it might not be that important, unless there is something we need 
 to fix at RM layer to make it more tolerant to RMStore issue.
 When RM was hard shutdown, OS might not get a chance to persist blocks. Some 
 of the stored application data end up with size zero after reboot. And RM 
 didn't like that.
 {noformat}
 ls -al 
 /var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351
 total 156
 drwxr-xr-x.2 x y   4096 Nov 13 16:45 .
 drwxr-xr-x. 1524 x y 151552 Nov 13 16:45 ..
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 appattempt_1412702189634_324351_01
 -rw-r--r--.1 x y  0 Nov 13 16:45 
 .appattempt_1412702189634_324351_01.crc
 -rw-r--r--.1 x y  0 Nov 13 16:45 application_1412702189634_324351
 -rw-r--r--.1 x y  0 Nov 13 16:45 .application_1412702189634_324351.crc
 {noformat}
 When RM starts up
 {noformat}
 2014-11-13 16:55:25,844 WARN org.apache.hadoop.fs.FSInputChecker: Problem 
 opening checksum file: 
 file:/var/log/hadoop/rmstore/FSRMStateRoot/RMAppRoot/application_1412702189634_324351/application_1412702189634_324351.
   Ignoring exception:
 java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at java.io.DataInputStream.readFully(DataInputStream.java:169)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:146)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:792)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore.readFile(FileSystemRMStateStore.java:501)
 ...
 2014-11-13 17:40:48,876 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Failed to 
 load/recover state
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ApplicationState.getAppId(RMStateStore.java:184)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:306)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1027)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:484)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:834)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212991#comment-14212991
 ] 

Hadoop QA commented on YARN-2392:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12665901/YARN-2392-002.patch
  against trunk revision 49c3889.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5853//console

This message is automatically generated.

 add more diags about app retry limits on AM failures
 

 Key: YARN-2392
 URL: https://issues.apache.org/jira/browse/YARN-2392
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Steve Loughran
 Attachments: YARN-2392-001.patch, YARN-2392-002.patch


 # when an app fails the failure count is shown, but not what the global + 
 local limits are. If the two are different, they should both be printed. 
 # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2802) add AM container launch and register delay metrics in QueueMetrics to help diagnose performance issue.

2014-11-14 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212992#comment-14212992
 ] 

zhihai xu commented on YARN-2802:
-

Hi [~jianhe] and [~vinodkv],

Could you review the patch? Since this patch change the Capacity Scheduler.
It passed the Hadoop QA.

thanks
zhihai

 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 --

 Key: YARN-2802
 URL: https://issues.apache.org/jira/browse/YARN-2802
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2802.000.patch, YARN-2802.001.patch, 
 YARN-2802.002.patch


 add AM container launch and register delay metrics in QueueMetrics to help 
 diagnose performance issue.
 Added two metrics in QueueMetrics:
 aMLaunchDelay: the time spent from sending event AMLauncherEventType.LAUNCH 
 to receiving event RMAppAttemptEventType.LAUNCHED in RMAppAttemptImpl.
 aMRegisterDelay: the time waiting from receiving event 
 RMAppAttemptEventType.LAUNCHED to receiving event 
 RMAppAttemptEventType.REGISTERED(ApplicationMasterService#registerApplicationMaster)
  in RMAppAttemptImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2014-11-14 Thread chang li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213006#comment-14213006
 ] 

chang li commented on YARN-2556:


I have run those failed tests on my local machine and they all passed with my 
patch

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: chang li
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 yarn2556.patch, yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2865) Application recovery continuously fails with Application with id already present. Cannot duplicate

2014-11-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213001#comment-14213001
 ] 

Hadoop QA commented on YARN-2865:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681615/YARN-2865.patch
  against trunk revision 49c3889.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5846//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5846//console

This message is automatically generated.

 Application recovery continuously fails with Application with id already 
 present. Cannot duplicate
 

 Key: YARN-2865
 URL: https://issues.apache.org/jira/browse/YARN-2865
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical
 Attachments: YARN-2865.patch


 YARN-2588 handles exception thrown while transitioningToActive and reset 
 activeServices. But it misses out clearing RMcontext apps/nodes details and 
 ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >