[jira] [Commented] (YARN-193) Scheduler.normalizeRequest does not account for allocation requests that exceed maximumAllocation limits

2013-02-22 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584366#comment-13584366
 ] 

Hitesh Shah commented on YARN-193:
--

@Zhijie, distributedshell is an example application and therefore explains how 
to write a good application which checks what the limits are and changes its 
requests accordingly. 

IMO, for applications which do not respect the limits, instead of reducing 
their defined requirements to the max value, we should throw an error as we are 
not sure if the app really needs that high amount of resources and whether it 
will actually if we reduce that amount to the max value.

Does that make sense? 

 Scheduler.normalizeRequest does not account for allocation requests that 
 exceed maximumAllocation limits 
 -

 Key: YARN-193
 URL: https://issues.apache.org/jira/browse/YARN-193
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 3.0.0
Reporter: Hitesh Shah
Assignee: Hitesh Shah
 Attachments: MR-3796.1.patch, MR-3796.2.patch, MR-3796.3.patch, 
 MR-3796.wip.patch, YARN-193.4.patch, YARN-193.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-02-22 Thread Kendall Thrapp (JIRA)
Kendall Thrapp created YARN-415:
---

 Summary: Capture memory utilization at the app-level for chargeback
 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp


For the purpose of chargeback, I'd like to be able to compute the cost of an
application in terms of cluster resource usage.  To start out, I'd like to get 
the memory utilization of an application.  The unit should be MB-seconds or 
something similar and, from a chargeback perspective, the memory amount should 
be the memory reserved for the application, as even if the app didn't use all 
that memory, no one else was able to use it.

(reserved ram for container 1 * lifetime of container 1) + (reserved ram for
container 2 * lifetime of container 2) + ... + (reserved ram for container n * 
lifetime of container n)

It'd be nice to have this at the app level instead of the job level because:
1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
appear on the job history server).
2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).

This new metric should be available both through the RM UI and RM Web Services 
REST API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-22 Thread Roger Hoover (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roger Hoover updated YARN-412:
--

Attachment: YARN-412.patch

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch, YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-412) FifoScheduler incorrectly checking for node locality

2013-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584473#comment-13584473
 ] 

Hadoop QA commented on YARN-412:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570495/YARN-412.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/417//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/417//console

This message is automatically generated.

 FifoScheduler incorrectly checking for node locality
 

 Key: YARN-412
 URL: https://issues.apache.org/jira/browse/YARN-412
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Roger Hoover
Priority: Minor
  Labels: patch
 Attachments: YARN-412.patch, YARN-412.patch


 In the FifoScheduler, the assignNodeLocalContainers method is checking if the 
 data is local to a node by searching for the nodeAddress of the node in the 
 set of outstanding requests for the app.  This seems to be incorrect as it 
 should be checking hostname instead.  The offending line of code is 455:
 application.getResourceRequest(priority, node.getRMNode().getNodeAddress());
 Requests are formated by hostname (e.g. host1.foo.com) whereas node addresses 
 are a concatenation of hostname and command port (e.g. host1.foo.com:1234)
 In the CapacityScheduler, it's done using hostname.  See 
 LeafQueue.assignNodeLocalContainers, line 1129
 application.getResourceRequest(priority, node.getHostName());
 Note that this bug does not affect the actual scheduling decisions made by 
 the FifoScheduler because even though it incorrect determines that a request 
 is not local to the node, it will still schedule the request immediately 
 because it's rack-local.  However, this bug may be adversely affecting the 
 reporting of job status by underreporting the number of tasks that were node 
 local.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI

2013-02-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-376:


Attachment: YARN-376.patch

Patch that adds a new interface to RMNode so ResourceTrackingService can 
atomically get-and-clear the list of containers and applications to cleanup.

 Apps that have completed can appear as RUNNING on the NM UI
 ---

 Key: YARN-376
 URL: https://issues.apache.org/jira/browse/YARN-376
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
 Attachments: YARN-376.patch


 On a busy cluster we've noticed a growing number of applications appear as 
 RUNNING on a nodemanager web pages but the applications have long since 
 finished.  Looking at the NM logs, it appears the RM never told the 
 nodemanager that the application had finished.  This is also reflected in a 
 jstack of the NM process, since many more log aggregation threads are running 
 then one would expect from the number of actively running applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI

2013-02-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-376:


Priority: Blocker  (was: Major)

Increasing to Blocker as this race can lead to lost logs since NM will not 
aggregate the logs until it thinks the application has completed.  In addition 
each leaked application in the NM has a corresponding log aggregation thread 
in the NM and eventually it will be unable to create new threads.

 Apps that have completed can appear as RUNNING on the NM UI
 ---

 Key: YARN-376
 URL: https://issues.apache.org/jira/browse/YARN-376
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-376.patch


 On a busy cluster we've noticed a growing number of applications appear as 
 RUNNING on a nodemanager web pages but the applications have long since 
 finished.  Looking at the NM logs, it appears the RM never told the 
 nodemanager that the application had finished.  This is also reflected in a 
 jstack of the NM process, since many more log aggregation threads are running 
 then one would expect from the number of actively running applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-02-22 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584667#comment-13584667
 ] 

Arun C Murthy commented on YARN-415:


+1 for the thought!

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI

2013-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584679#comment-13584679
 ] 

Hadoop QA commented on YARN-376:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570527/YARN-376.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/418//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/418//console

This message is automatically generated.

 Apps that have completed can appear as RUNNING on the NM UI
 ---

 Key: YARN-376
 URL: https://issues.apache.org/jira/browse/YARN-376
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-376.patch


 On a busy cluster we've noticed a growing number of applications appear as 
 RUNNING on a nodemanager web pages but the applications have long since 
 finished.  Looking at the NM logs, it appears the RM never told the 
 nodemanager that the application had finished.  This is also reflected in a 
 jstack of the NM process, since many more log aggregation threads are running 
 then one would expect from the number of actively running applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-392) Make it possible to schedule to specific nodes without dropping locality

2013-02-22 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584703#comment-13584703
 ] 

Alejandro Abdelnur commented on YARN-392:
-

I'd like to restate he problem.

Making things a bit more high level, the end goal is for an AM to give certain 
hints to the RM scheduler on how it plans to use requested resources.

Hints are just that, 'hints'. They may not be taken into consideration, AMs 
must not rely on hints to be able to work properly. It is fine if the RM 
scheduler ignores hints completely (because it is too busy or because it does 
not understand them). An RM scheduler that understands a hint may use it to 
make more optimal allocation decisions and may give AMs a speed boost.

Another thing to keep in mind is that hints won't complicate the RM logic as 
the RM only involvement is passing them to the scheduler.

Examples of hints are: gang scheduling, desired locality, desired 
multi-locality, resources fulfillment timeout, future resource allocation.

I can understand the worries about going task centric, but I think the hints 
approach is a bit different. Being able to specify hints will enable scheduling 
features experimentation without requiring protocol changes. Eventually, if we 
find that a hint is a good feature to support at scheduler API level we may 
eventually add it to the protocol/API.

The changes in the protocol/API would be as simple as having and extra String 
field in resource requests and resources allocations to indicate hints (on 
requests) and receive the hints taken into consideration (on allocations).

Thoughts?

 Make it possible to schedule to specific nodes without dropping locality
 

 Key: YARN-392
 URL: https://issues.apache.org/jira/browse/YARN-392
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Sandy Ryza
 Attachments: YARN-392.patch


 Currently its not possible to specify scheduling requests for specific nodes 
 and nowhere else. The RM automatically relaxes locality to rack and * and 
 assigns non-specified machines to the app.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable

2013-02-22 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584711#comment-13584711
 ] 

Sandy Ryza commented on YARN-24:


I encountered this when trying to start a NM and a namenode at the same time.  
The NM shut down because the namenode was in safe mode.  Having the NM die in 
this way introduces a dependency in the order that services are started.

Log aggregation is checked each time an app is run on a node, and the app is 
immediately killed if a log folder cannot be used for it.  Thus, merely 
removing the NM killing itself on startup doesn't introduce any correctness 
issues.  The worst that could happen is that time could be wasted by scheduling 
more containers on a node we already know has connection issues to the namenode.

Attached a patch that removes the NM killing itself on startup.  At initApp 
time, if verifyAndCreateRemoteLogDir has not been successfully completed, it is 
called again, and the app is failed if it fails.  If initApp fails five 
consecutive times, the NM sets its status to unhealthy.

I agree if an NM loses its ability to connect to the namenode after an app has 
started, it would be good for the NMs to report that they weren't able to write 
their logs, but my opinion is that that is a more difficult issue and does not 
need to be tied to this change. 

 Nodemanager fails to start if log aggregation enabled and namenode unavailable
 --

 Key: YARN-24
 URL: https://issues.apache.org/jira/browse/YARN-24
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
 Attachments: YARN-24.patch


 If log aggregation is enabled and the namenode is currently unavailable, the 
 nodemanager fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-24) Nodemanager fails to start if log aggregation enabled and namenode unavailable

2013-02-22 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-24:
---

Attachment: YARN-24.patch

 Nodemanager fails to start if log aggregation enabled and namenode unavailable
 --

 Key: YARN-24
 URL: https://issues.apache.org/jira/browse/YARN-24
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 0.23.3, 2.0.0-alpha
Reporter: Jason Lowe
 Attachments: YARN-24.patch


 If log aggregation is enabled and the namenode is currently unavailable, the 
 nodemanager fails to startup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2013-02-22 Thread Andy Rhee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584738#comment-13584738
 ] 

Andy Rhee commented on YARN-415:


+1 This feature is way overdue!

 Capture memory utilization at the app-level for chargeback
 --

 Key: YARN-415
 URL: https://issues.apache.org/jira/browse/YARN-415
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Affects Versions: 0.23.6
Reporter: Kendall Thrapp

 For the purpose of chargeback, I'd like to be able to compute the cost of an
 application in terms of cluster resource usage.  To start out, I'd like to 
 get the memory utilization of an application.  The unit should be MB-seconds 
 or something similar and, from a chargeback perspective, the memory amount 
 should be the memory reserved for the application, as even if the app didn't 
 use all that memory, no one else was able to use it.
 (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
 container 2 * lifetime of container 2) + ... + (reserved ram for container n 
 * lifetime of container n)
 It'd be nice to have this at the app level instead of the job level because:
 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
 appear on the job history server).
 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
 This new metric should be available both through the RM UI and RM Web 
 Services REST API.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI

2013-02-22 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-376:


Attachment: YARN-376.patch

Updated patch so the test has a timeout.

 Apps that have completed can appear as RUNNING on the NM UI
 ---

 Key: YARN-376
 URL: https://issues.apache.org/jira/browse/YARN-376
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-376.patch, YARN-376.patch


 On a busy cluster we've noticed a growing number of applications appear as 
 RUNNING on a nodemanager web pages but the applications have long since 
 finished.  Looking at the NM logs, it appears the RM never told the 
 nodemanager that the application had finished.  This is also reflected in a 
 jstack of the NM process, since many more log aggregation threads are running 
 then one would expect from the number of actively running applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-376) Apps that have completed can appear as RUNNING on the NM UI

2013-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584770#comment-13584770
 ] 

Hadoop QA commented on YARN-376:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570543/YARN-376.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/420//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/420//console

This message is automatically generated.

 Apps that have completed can appear as RUNNING on the NM UI
 ---

 Key: YARN-376
 URL: https://issues.apache.org/jira/browse/YARN-376
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: YARN-376.patch, YARN-376.patch


 On a busy cluster we've noticed a growing number of applications appear as 
 RUNNING on a nodemanager web pages but the applications have long since 
 finished.  Looking at the NM logs, it appears the RM never told the 
 nodemanager that the application had finished.  This is also reflected in a 
 jstack of the NM process, since many more log aggregation threads are running 
 then one would expect from the number of actively running applications.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-365:
---

Attachment: YARN-365.8.patch

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-365:
---

Attachment: (was: YARN-365.8.patch)

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-365:
---

Attachment: YARN-365.8.patch

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-365:
---

Attachment: YARN-365.8.patch

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-365:
---

Attachment: (was: YARN-365.8.patch)

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584896#comment-13584896
 ] 

Hadoop QA commented on YARN-365:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570570/YARN-365.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/422//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/422//console

This message is automatically generated.

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-365) Each NM heartbeat should not generate an event for the Scheduler

2013-02-22 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13584906#comment-13584906
 ] 

Xuan Gong commented on YARN-365:


The one test file that did not include any timeout is MockNode.java file which 
is under the test directory

 Each NM heartbeat should not generate an event for the Scheduler
 

 Key: YARN-365
 URL: https://issues.apache.org/jira/browse/YARN-365
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, scheduler
Affects Versions: 0.23.5
Reporter: Siddharth Seth
Assignee: Xuan Gong
 Attachments: Prototype2.txt, Prototype3.txt, YARN-365.1.patch, 
 YARN-365.2.patch, YARN-365.3.patch, YARN-365.4.patch, YARN-365.5.patch, 
 YARN-365.6.patch, YARN-365.7.patch, YARN-365.8.patch


 Follow up from YARN-275
 https://issues.apache.org/jira/secure/attachment/12567075/Prototype.txt

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-416) Limiting the Diagnostic related data on Application Overview Page

2013-02-22 Thread omkar vinit joshi (JIRA)
omkar vinit joshi created YARN-416:
--

 Summary: Limiting the Diagnostic related data on Application 
Overview Page
 Key: YARN-416
 URL: https://issues.apache.org/jira/browse/YARN-416
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: omkar vinit joshi


On Application overview page; Diagnostic data is printed as it is without 
limiting the total content in it. There should be someway to control ( in terms 
of lines or total characters) the final diagnostic data. One way this can be 
done is by adding configuration parameters or by adding some control on the 
front end.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-196) Nodemanager if started before starting Resource manager is getting shutdown.But if both RM and NM are started and then after if RM is going down,NM is retrying for the RM

2013-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585000#comment-13585000
 ] 

Hadoop QA commented on YARN-196:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570598/YARN-196.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/423//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/423//console

This message is automatically generated.

 Nodemanager if started before starting Resource manager is getting 
 shutdown.But if both RM and NM are started and then after if RM is going 
 down,NM is retrying for the RM.
 ---

 Key: YARN-196
 URL: https://issues.apache.org/jira/browse/YARN-196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.0.0-alpha
Reporter: Ramgopal N
Assignee: Xuan Gong
 Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
 YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch


 If NM is started before starting the RM ,NM is shutting down with the 
 following error
 {code}
 ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
 services org.apache.hadoop.yarn.server.nodemanager.NodeManager
 org.apache.avro.AvroRuntimeException: 
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
 Caused by: java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
   ... 3 more
 Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
 Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
 connection exception: java.net.ConnectException: Connection refused; For more 
 details see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
   at $Proxy23.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
   ... 5 more
 Caused by: java.net.ConnectException: Call From 
 HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
 exception: java.net.ConnectException: Connection refused; For more details 
 see:  http://wiki.apache.org/hadoop/ConnectionRefused
   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
   at org.apache.hadoop.ipc.Client.call(Client.java:1141)
   at org.apache.hadoop.ipc.Client.call(Client.java:1100)
   at 
 org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
   ... 7 more
 Caused by: java.net.ConnectException: Connection refused
   at