[jira] [Created] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)
nemon lou created YARN-447:
--

 Summary: applicationComparator improvement for CS
 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch

Now the compare code is :
return a1.getApplicationId().getId() - a2.getApplicationId().getId();

Will be replaced with :
return a1.getApplicationId().compareTo(a2.getApplicationId());

This will bring some benefits:
1,leave applicationId compare logic to ApplicationId class;
2,In future's HA mode,cluster time stamp may change,ApplicationId class already 
takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592167#comment-13592167
 ] 

Hadoop QA commented on YARN-447:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571874/YARN-447-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestLeafQueue

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/457//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/457//console

This message is automatically generated.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Attaching a simple patch with a test case.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-446) Container killed before hprof dumps profile.out

2013-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592246#comment-13592246
 ] 

Jason Lowe commented on YARN-446:
-

IMO the AM should always allow the task attempt time to exit successfully on 
its own rather than sending it a kill signal that races with the normal 
shutdown of the task attempt.  This is very similar to the race between the AM 
shutting down after unregistering with the RM and the subsequent kill being 
sent by the RM which was mitigated by MAPREDUCE-4157.  This would also help 
eliminate the many confusing Container killed by ApplicationMaster messages 
that are appearing in task attempt diagnostics for tasks that are otherwise 
operating normally.

 Container killed before hprof dumps profile.out
 ---

 Key: YARN-446
 URL: https://issues.apache.org/jira/browse/YARN-446
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
Reporter: Radim Kolar

 If there is profiling enabled for mapper or reducer then hprof dumps 
 profile.out at process exit. It is dumped after task signaled to AM that work 
 is finished.
 AM kills container with finished work without waiting for hprof to finish 
 dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
 works) , it could not finish dump in time before being killed making entire 
 dump unusable because cpu and heap stats are missing.
 There needs to be better delay before container is killed if profiling is 
 enabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-448) Remove unnecessary hflush from log aggregation

2013-03-04 Thread Kihwal Lee (JIRA)
Kihwal Lee created YARN-448:
---

 Summary: Remove unnecessary hflush from log aggregation
 Key: YARN-448
 URL: https://issues.apache.org/jira/browse/YARN-448
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.7, 2.0.4-beta
Reporter: Kihwal Lee
Assignee: Kihwal Lee


AggregatedLogFormat#writeVersion() calls hflush() after writing the version. 
Calling hflush does not seem to be necessary. It can add a lot of load to hdfs 
in a big busy cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

2013-03-04 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592371#comment-13592371
 ] 

Chris Riccomini commented on YARN-417:
--

Looks good to me!

 Add a poller that allows the AM to receive notifications when it is assigned 
 containers
 ---

 Key: YARN-417
 URL: https://issues.apache.org/jira/browse/YARN-417
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, applications
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
 YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417.patch, 
 YarnAppMaster.java, YarnAppMasterListener.java


 Writing AMs would be easier for some if they did not have to handle 
 heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: YARN-18-v3.patch

Sync patch with recently changes on YARN.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-448) Remove unnecessary hflush from log aggregation

2013-03-04 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592491#comment-13592491
 ] 

Kihwal Lee commented on YARN-448:
-

Test not included since it does not affect normal cases. Even in failure cases, 
no current error handling in log aggregation is affected by the existence or 
absence of version record in a log that failed during aggregation.

 Remove unnecessary hflush from log aggregation
 --

 Key: YARN-448
 URL: https://issues.apache.org/jira/browse/YARN-448
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.7, 2.0.4-beta
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: yarn-448.patch.txt


 AggregatedLogFormat#writeVersion() calls hflush() after writing the version. 
 Calling hflush does not seem to be necessary. It can add a lot of load to 
 hdfs in a big busy cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM.

2013-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-196:
---

Attachment: YARN-196.7.patch

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
 YARN-196.6.patch, YARN-196.7.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
   at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
   at org.apache.hadoop.ipc.Client.call(Client.java:1117)
   ... 9 more
 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
 AsyncDispatcher thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
   at java.lang.Thread.run(Thread.java:619)
 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: 
 Service:Dispatcher is stopped.
 2012-01-16 15:04:13,392 INFO 

[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592517#comment-13592517
 ] 

Hadoop QA commented on YARN-18:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571919/YARN-18-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/459//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/459//console

This message is automatically generated.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592533#comment-13592533
 ] 

Hadoop QA commented on YARN-196:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571927/YARN-196.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/460//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/460//console

This message is automatically generated.

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
 YARN-196.6.patch, YARN-196.7.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: 

[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM

2013-03-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592544#comment-13592544
 ] 

Hitesh Shah commented on YARN-196:
--

+  throw new YarnException(Invalid Configuration.  +
+  RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS  +
+  should not be negative.);

Should replace RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS  with 
YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_SECS as a user will 
not understand anything if the log/exception has a variable name in it - we 
should use the property name defined in the configs as that provides a more 
clear explanation to the user. Likewise, fix the exception thrown later in the 
code as well as the log messages.





 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
 YARN-196.6.patch, YARN-196.7.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
   at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
   at org.apache.hadoop.ipc.Client.call(Client.java:1117)
   ... 9 more
 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
 AsyncDispatcher thread interrupted
 java.lang.InterruptedException
   at 
 

[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-198:
-

Attachment: YARN-198.patch

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-198:
-

Attachment: (was: YARN-198.patch)

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-198:
-

Attachment: (was: YARN-198.patch)

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-227) Application expiration difficult to debug for end-users

2013-03-04 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592558#comment-13592558
 ] 

Jonathan Eagles commented on YARN-227:
--

+1. Jason. If you can provide a 23 patch, I can check the code in there too.

 Application expiration difficult to debug for end-users
 ---

 Key: YARN-227
 URL: https://issues.apache.org/jira/browse/YARN-227
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Attachments: YARN-227.patch


 When an AM attempt expires the AMLivelinessMonitor in the RM will kill the 
 job and mark it as failed.  However there are no diagnostic messages set for 
 the application indicating that the application failed because of expiration. 
  Even if the AM logs are examined, it's often not obvious that the 
 application was externally killed.  The only evidence of what happened to the 
 application is currently in the RM logs, and those are often not accessible 
 by users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-369) Handle ( or throw a proper error when receiving) status updates from application masters that have not registered

2013-03-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592567#comment-13592567
 ] 

Bikas Saha commented on YARN-369:
-

The RM already verifies that the app attempt is valid. This is done via the 
responseMap that sounds similar to the map you propose. This map gets populated 
when the attempt is created and so the RM ApplicationMasterService is informed 
that the new app attempt is the official one. Look at 
ApplicationMasterService.registerAppAttempt().
Given the current state of the code, the simplest solution would be to set the 
responseId in ApplicationMasterService.registerAppAttempt() to Integer.MIN (-ve 
number). And then in registerApplicationMaster, set the responseId of 
lastResponse to 0 because after that the application can start issuing allocate 
request. If the app does allocate before register then the existing checks in 
allocate() will fail and we will be safe.
Would be great to add a test for this basic functionality.

 Handle ( or throw a proper error when receiving) status updates from 
 application masters that have not registered
 -

 Key: YARN-369
 URL: https://issues.apache.org/jira/browse/YARN-369
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Abhishek Kapoor

 Currently, an allocate call from an unregistered application is allowed and 
 the status update for it throws a statemachine error that is silently dropped.
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 STATUS_UPDATE at LAUNCHED
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:588)
at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:99)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:471)
at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:452)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:680)
 ApplicationMasterService should likely throw an appropriate error for 
 applications' requests that should not be handled in such cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592575#comment-13592575
 ] 

Hadoop QA commented on YARN-198:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12571937/YARN-198.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServer
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/461//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/461//console

This message is automatically generated.

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-448) Remove unnecessary hflush from log aggregation

2013-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592588#comment-13592588
 ] 

Hudson commented on YARN-448:
-

Integrated in Hadoop-trunk-Commit #3412 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3412/])
YARN-448. Remove unnecessary hflush from log aggregation (Kihwal Lee via 
bobby) (Revision 1452475)

 Result = SUCCESS
bobby : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1452475
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/AggregatedLogFormat.java


 Remove unnecessary hflush from log aggregation
 --

 Key: YARN-448
 URL: https://issues.apache.org/jira/browse/YARN-448
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.7, 2.0.4-beta
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Fix For: 3.0.0, 0.23.7, 2.0.4-beta

 Attachments: yarn-448.patch.txt


 AggregatedLogFormat#writeVersion() calls hflush() after writing the version. 
 Calling hflush does not seem to be necessary. It can add a lot of load to 
 hdfs in a big busy cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-227) Application expiration difficult to debug for end-users

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592645#comment-13592645
 ] 

Hadoop QA commented on YARN-227:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12571954/YARN-227-branch-0.23.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/463//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/463//console

This message is automatically generated.

 Application expiration difficult to debug for end-users
 ---

 Key: YARN-227
 URL: https://issues.apache.org/jira/browse/YARN-227
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Attachments: YARN-227-branch-0.23.patch, YARN-227.patch


 When an AM attempt expires the AMLivelinessMonitor in the RM will kill the 
 job and mark it as failed.  However there are no diagnostic messages set for 
 the application indicating that the application failed because of expiration. 
  Even if the AM logs are examined, it's often not obvious that the 
 application was externally killed.  The only evidence of what happened to the 
 application is currently in the RM logs, and those are often not accessible 
 by users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-345) Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager

2013-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592654#comment-13592654
 ] 

Jason Lowe commented on YARN-345:
-

+1, lgtm.

 Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager
 --

 Key: YARN-345
 URL: https://issues.apache.org/jira/browse/YARN-345
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.5
Reporter: Devaraj K
Assignee: Robert Parker
Priority: Critical
 Attachments: YARN-345.patch, YARN-354v2.patch


 {code:xml}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 {code:xml}
 2013-01-17 04:03:46,726 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 {code:xml}
 2013-01-17 00:01:11,006 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHING_CONTAINERS_WAIT
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 

[jira] [Commented] (YARN-345) Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager

2013-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592684#comment-13592684
 ] 

Hudson commented on YARN-345:
-

Integrated in Hadoop-trunk-Commit #3413 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3413/])
YARN-345. Many InvalidStateTransitonException errors for ApplicationImpl in 
Node Manager. Contributed by Robert Parker (Revision 1452548)

 Result = SUCCESS
jlowe : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1452548
Files : 
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 Many InvalidStateTransitonException errors for ApplicationImpl in Node Manager
 --

 Key: YARN-345
 URL: https://issues.apache.org/jira/browse/YARN-345
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha, 0.23.5
Reporter: Devaraj K
Assignee: Robert Parker
Priority: Critical
 Attachments: YARN-345.patch, YARN-354v2.patch


 {code:xml}
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 {code:xml}
 2013-01-17 04:03:46,726 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at APPLICATION_RESOURCES_CLEANINGUP
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 {code:xml}
 2013-01-17 00:01:11,006 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 

[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

2013-03-04 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592777#comment-13592777
 ] 

Bikas Saha commented on YARN-417:
-

Calling client.registerApp() before client.start() and client.stop() before 
client.unregister() is not in line with the Service interface. Services need to 
be started before used and stopped after using them. Also, adding a number of 
services as part of a composite service is a common pattern. In that, all 
services are added, inited, started, used and then stopped . The composite 
service takes care of ordering between services. In such use cases, it may not 
possible to call interface methods out of order as is being done here. We could 
enhance the heartbeater to not heartbeat until register is called. or we could 
start the heartbeater after registration is complete. The latter approach makes 
more sense to me.

I am surprised that the DistShell code is calling resourceManager.stop() and 
then resourceManager.unregister() because stop() eventually call 
AMRMClientImpl.stop() that shuts down the proxy. After that, unregister() call 
on AMRMClientImpl should fail.

Why are we calling client.start() in the init() method and not at the beginning 
of the start method()? Perhaps related to the above comment.
{code}
+  @Override
+  public void init(Configuration conf) {
+super.init(conf);
+client.init(conf);
+client.start();
+  }
{code}

Why not wait for the handlerThread to join()? The comment does not match the 
code for the heartbeat thread.
{code}
+  /**
+   * Tells the heartbeat thread to stop, but does not wait for it to return.
+   */
+  @Override
+  public void stop() {
+client.stop();
+keepRunning = false;
+try {
+  heartbeatThread.join();
+} catch (InterruptedException ex) {
+  LOG.error(Error joining with heartbeat thread, ex);
+}
+handlerThread.interrupt();
+  }
{code}

In general, it would be good to spend some thought on the thread safety of the 
new class. Both external calls from the app and the internal producer/consumer 
race between the heartbeat and callback threads. During startup, execution and 
shutdown. I havent thought through them but the almost complete absence of any 
synchronization made be wonder if it was by design.
I would prefer queue.put() which blocks on capacity instead of queue.add() to 
mirror queue.take().

Could save some time using wait/notify? Important for end to end tests time.
{done}
+while (!done) {
+  try {
+Thread.sleep(1000);
+  } catch (InterruptedException ex) {}
+}
{done}

Looks like this is only for tests. If yes, how about making it package private 
and annotating with @Private and @VisibleForTesting.
{code}
+  public AMRMClientAsync(AMRMClient client, int intervalMs,
+  CallbackHandler callbackHandler) {
{code}

A committer once told me that the philosophy behind BuilderUtils it to pass all 
members of the object being built and use it as a completely defined 
constructor so that folks dont miss passing any member fields by accident. So I 
guess nodeUpdates and reboot should also be passed in as arguments.
{code}
+  public static AMResponse newAMResponse(
+  ListContainerStatus completedContainers,
+  ListContainer allocatedContainers) {
{code}

I would like the test code to not exemplify incorrect use of the class. The 
test is calling allocate without call register and it all works. Maybe if we 
fixed the first comment in this review then it wont allow such incorrect usage. 
Secondly, folks tend to look at test code to see usage of a class and so 
showing incorrect usage is not a good idea IMO.
{code}
+AMRMClientAsync asyncClient = new AMRMClientAsync(client, 200, 
callbackHandler);
+asyncClient.init(conf);
+asyncClient.start();
+
+while (callbackHandler.takeAllocatedContainers() == null) {
+
{code}

This code can lead to a flaky test. If I understand the flow correctly the 
following can happen. CallbackHandler populates allocatedContainers and OS 
pauses it. In the meanwhile heartbeater has already given completedContainers. 
The main thread then takesAllocatedContainers and it pauses. The 
CallbackHandler then returns and onCompletedContainers() is called which 
populates completed containers. Then it pauses. The main thread executes 
takeCompletedContainers() which returns non-null and the Assert fails. Is this 
a correct understanding? If yes, we should make sure that the test does not end 
up being flaky. In general sleep() should be avoided because it makes tests 
slow and tend to be flaky. I agree in some case, sleep is hard to avoid when 
the test is running an inline service whose timing we cannot control or when 
the effort to do so is too large. But in this case where all the code is test 
code or mock code, we could avoid sleeping.
{code}
+while (callbackHandler.takeAllocatedContainers() == null) {

[jira] [Updated] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM.

2013-03-04 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-196:
---

Attachment: YARN-196.8.patch

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
 YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
   at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
   at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
   at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
   at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
   at org.apache.hadoop.ipc.Client.call(Client.java:1117)
   ... 9 more
 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
 AsyncDispatcher thread interrupted
 java.lang.InterruptedException
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
   at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
   at 
 java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
   at java.lang.Thread.run(Thread.java:619)
 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: 
 Service:Dispatcher is stopped.
 2012-01-16 

[jira] [Updated] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects

2013-03-04 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-449:


Summary: MRAppMaster classpath not set properly for unit tests in 
downstream projects  (was: MRAppMaster not set properly for unit tests in 
downstream projects)

 MRAppMaster classpath not set properly for unit tests in downstream projects
 

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker

 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-227) Application expiration difficult to debug for end-users

2013-03-04 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592823#comment-13592823
 ] 

Jonathan Eagles commented on YARN-227:
--

It looks like the eclipse:eclipse issue is spurious.

 Application expiration difficult to debug for end-users
 ---

 Key: YARN-227
 URL: https://issues.apache.org/jira/browse/YARN-227
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 0.23.3, 2.0.1-alpha
Reporter: Jason Lowe
Assignee: Jason Lowe
  Labels: usability
 Attachments: YARN-227-branch-0.23.patch, YARN-227.patch


 When an AM attempt expires the AMLivelinessMonitor in the RM will kill the 
 job and mark it as failed.  However there are no diagnostic messages set for 
 the application indicating that the application failed because of expiration. 
  Even if the AM logs are examined, it's often not obvious that the 
 application was externally killed.  The only evidence of what happened to the 
 application is currently in the RM logs, and those are often not accessible 
 by users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-378) ApplicationMaster retry times should be set by Client

2013-03-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592824#comment-13592824
 ] 

Zhijie Shen commented on YARN-378:
--

My strategy is that:

1. Create another Yarn property yarn.application.am.max-retries, which is the 
name of the application-specific max retry number (no default value is 
required).

2. The number is passed from client to resourcemanager (set by the client and 
imbedded in job.xml).

3. If yarn.application.am.max-retries is not set, the value of 
yarn.resourcemanager.am.max-retries is used. Otherwise, if 
yarn.application.am.max-retries = yarn.resourcemanager.am.max-retries, the 
value of yarn.application.am.max-retries is used. In the remaining case, the 
value of yarn.resourcemanager.am.max-retries is used and a warning record is 
logged.

How do you think abou the strategy?

 ApplicationMaster retry times should be set by Client
 -

 Key: YARN-378
 URL: https://issues.apache.org/jira/browse/YARN-378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
 Environment: suse
Reporter: xieguiming
Assignee: Zhijie Shen
  Labels: usability

 We should support that different client or user have different 
 ApplicationMaster retry times. It also say that 
 yarn.resourcemanager.am.max-retries should be set by client. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-198:
-

Attachment: YARN-198.patch

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-198:
-

Attachment: (was: YARN-198.patch)

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread jian he (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jian he updated YARN-198:
-

Attachment: YARN-198.patch

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-198) If we are navigating to Nodemanager UI from Resourcemanager,then there is not link to navigate back to Resource manager

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592875#comment-13592875
 ] 

Hadoop QA commented on YARN-198:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572000/YARN-198.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/465//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/465//console

This message is automatically generated.

 If we are navigating to Nodemanager UI from Resourcemanager,then there is not 
 link to navigate back to Resource manager
 ---

 Key: YARN-198
 URL: https://issues.apache.org/jira/browse/YARN-198
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Ramgopal N
Assignee: jian he
Priority: Minor
  Labels: usability
 Attachments: YARN-198.patch


 If we are navigating to Nodemanager by clicking on the node link in RM,there 
 is no link provided on the NM to navigate back to RM.
  If there is a link to navigate back to RM it would be good

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects

2013-03-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-449:


Attachment: hbase-TestHFileOutputFormat-wip.txt

With this change, I was able to get TestHFileOutputFormat#testWritingPEData to 
pass.

 MRAppMaster classpath not set properly for unit tests in downstream projects
 

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: hbase-TestHFileOutputFormat-wip.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects

2013-03-04 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592915#comment-13592915
 ] 

Hitesh Shah commented on YARN-449:
--

This probably will work for a short term until the internal implementation of 
MiniYarnCluster or any other minicluster for that matter introduces a new 
config property that it needs/refers to. 

Looking at the hbase tests, it seems like that instead of using the config 
object returned by the MiniMRCluster and building on top of it, it tries to do 
some form of a union between 2 confs. In such cases, chances of missing some 
internal settings are always likely. I believe there was an earlier fix to set 
the framework.name to 'yarn' to solve something similar to the current problem 
when hbase starting running tests against 0.23.

[~te...@apache.org], do you have any comments on the above? Is it possible to 
change the base test class for hbase unit tests to build upon the config 
provided by the mini cluster? Any reason for not doing so? 

 MRAppMaster classpath not set properly for unit tests in downstream projects
 

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: hbase-TestHFileOutputFormat-wip.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: YARN-18-v3.1.patch

Add timeout in test at v3.1 patch.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects

2013-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592981#comment-13592981
 ] 

Ted Yu commented on YARN-449:
-

It is possible to change HBase test class.

I would spend some time tomorrow in understanding why the following code in 
MiniYARNCluster doesn't give us expected effect:
{code}
public synchronized void start() {
  try {
getConfig().setBoolean(YarnConfiguration.IS_MINI_YARN_CLUSTER, true);
{code}


 MRAppMaster classpath not set properly for unit tests in downstream projects
 

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: hbase-TestHFileOutputFormat-wip.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13592986#comment-13592986
 ] 

Hadoop QA commented on YARN-18:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572014/YARN-18-v3.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/466//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/466//console

This message is automatically generated.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-417) Add a poller that allows the AM to receive notifications when it is assigned containers

2013-03-04 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593023#comment-13593023
 ] 

Sandy Ryza commented on YARN-417:
-

Thanks for the detailed comments, Bikas.  Other than what's discussed below, 
I'll make the changes you suggest (switch to wait/notify, you're right about 
the race in the test, will have the heartbeater start in the register method, 
etc.)

bq. Why not wait for the handlerThread to join()?
My thought was that the user should be able to call stop() from the callback 
handler and not deadlock.  Even if we were to explicitly warn against this, 
users would be likely to try it anyway and encounter difficulties.

Regarding synchronization, I had put some thought into it, and my understanding 
is that it should work without synchronized methods.  A coarse version of the 
thinking behind this is:
* All the methods of AMRMClientAsync other than init(), start(), and stop() do 
not touch any variables in AMRMClientAsync and delegate to AMRMClient.  
AMRMClient handles the interleaving of any of these methods with each other, 
and interleaving them with start(), stop(), and init().
* If any of these methods are interleaved with stop(), there will be no problem.
* Calling any of these methods before or at the same time as init() or start() 
is incorrect use of the class, and can cause problems even if the methods are 
synchronized.  Additionally, after the start/register change you proposed, all 
that init() and start will do is delegate to AMRMClient anyway.
Let me know if you see anything I'm missing.

 Add a poller that allows the AM to receive notifications when it is assigned 
 containers
 ---

 Key: YARN-417
 URL: https://issues.apache.org/jira/browse/YARN-417
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, applications
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: AMRMClientAsync-1.java, AMRMClientAsync.java, 
 YARN-417-1.patch, YARN-417-2.patch, YARN-417-3.patch, YARN-417.patch, 
 YarnAppMaster.java, YarnAppMasterListener.java


 Writing AMs would be easier for some if they did not have to handle 
 heartbeating to the RM on their own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-18:
---

Attachment: YARN-18-v3.2.patch

Add timeout to all related tests in v3.2 patch.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.2.patch, YARN-18-v3.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects

2013-03-04 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated YARN-449:


Attachment: hbase-TestingUtility-wip.txt

Patch where I try to add yarn.is.minicluster at cluster startup

 MRAppMaster classpath not set properly for unit tests in downstream projects
 

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: hbase-TestHFileOutputFormat-wip.txt, 
 hbase-TestingUtility-wip.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-18) Make locatlity in YARN's container assignment and task scheduling pluggable for other deployment topology

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593056#comment-13593056
 ] 

Hadoop QA commented on YARN-18:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572023/YARN-18-v3.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/467//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/467//console

This message is automatically generated.

 Make locatlity in YARN's container assignment and task scheduling pluggable 
 for other deployment topology
 -

 Key: YARN-18
 URL: https://issues.apache.org/jira/browse/YARN-18
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.0.3-alpha
Reporter: Junping Du
Assignee: Junping Du
  Labels: features
 Attachments: 
 HADOOP-8474-ContainerAssignmentTaskScheduling-pluggable.patch, 
 MAPREDUCE-4309.patch, MAPREDUCE-4309-v2.patch, MAPREDUCE-4309-v3.patch, 
 MAPREDUCE-4309-v4.patch, MAPREDUCE-4309-v5.patch, MAPREDUCE-4309-v6.patch, 
 MAPREDUCE-4309-v7.patch, YARN-18.patch, YARN-18-v2.patch, YARN-18-v3.1.patch, 
 YARN-18-v3.2.patch, YARN-18-v3.patch


 There are several classes in YARN’s container assignment and task scheduling 
 algorithms that relate to data locality which were updated to give preference 
 to running a container on other locality besides node-local and rack-local 
 (like nodegroup-local). This propose to make these data structure/algorithms 
 pluggable, like: SchedulerNode, RMNodeImpl, etc. The inner class 
 ScheduledRequests was made a package level class to it would be easier to 
 create a subclass, ScheduledRequestsWithNodeGroup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-449) MRAppMaster classpath not set properly for unit tests in downstream projects

2013-03-04 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593080#comment-13593080
 ] 

Ted Yu commented on YARN-449:
-

The second patch made TestTableMapReduce pass based on 2.0.4-SNAPSHOT:
{code}
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
2013-03-04 21:09:47.027 java[28537:1203] Unable to load realm info from 
SCDynamicStore
2013-03-04 21:09:47.166 java[28537:1203] Unable to load realm info from 
SCDynamicStore
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 238.86 sec
{code}


 MRAppMaster classpath not set properly for unit tests in downstream projects
 

 Key: YARN-449
 URL: https://issues.apache.org/jira/browse/YARN-449
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Priority: Blocker
 Attachments: hbase-TestHFileOutputFormat-wip.txt, 
 hbase-TestingUtility-wip.txt


 Post YARN-429, unit tests for HBase continue to fail since the classpath for 
 the MRAppMaster is not being set correctly.
 Reverting YARN-129 may fix this, but I'm not sure that's the correct 
 solution. My guess is, as Alexandro pointed out in YARN-129, maven 
 classloader magic is messing up java.class.path.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Use real applicationId instead of mock one in TestUtil.So applicationId's 
compareTo method will do its work

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nemon lou updated YARN-447:
---

Attachment: YARN-447-trunk.patch

Adding a timeout

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-429) capacity-scheduler config missing from yarn-test artifact

2013-03-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593132#comment-13593132
 ] 

stack commented on YARN-429:


Patch looks reasonable to me.

 capacity-scheduler config missing from yarn-test artifact
 -

 Key: YARN-429
 URL: https://issues.apache.org/jira/browse/YARN-429
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Blocker
 Attachments: YARN-429.txt


 MiniYARNCluster and MiniMRCluster are unusable by downstream projects with 
 the 2.0.3-alpha release, since the capacity-scheduler configuration is 
 missing from the test artifact.
 hadoop-yarn-server-tests-3.0.0-SNAPSHOT-tests.jar should include the default 
 capacity-scheduler configuration. Also, this doesn't need to be part of the 
 default classpath - and should be moved out of the top level directory in the 
 dist package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593133#comment-13593133
 ] 

Hadoop QA commented on YARN-447:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572040/YARN-447-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/468//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/468//console

This message is automatically generated.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-447) applicationComparator improvement for CS

2013-03-04 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13593135#comment-13593135
 ] 

nemon lou commented on YARN-447:


This patch is ready for review now.
Thank you.

 applicationComparator improvement for CS
 

 Key: YARN-447
 URL: https://issues.apache.org/jira/browse/YARN-447
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: nemon lou
Priority: Minor
 Attachments: YARN-447-trunk.patch, YARN-447-trunk.patch, 
 YARN-447-trunk.patch


 Now the compare code is :
 return a1.getApplicationId().getId() - a2.getApplicationId().getId();
 Will be replaced with :
 return a1.getApplicationId().compareTo(a2.getApplicationId());
 This will bring some benefits:
 1,leave applicationId compare logic to ApplicationId class;
 2,In future's HA mode,cluster time stamp may change,ApplicationId class 
 already takes care of this condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira