[jira] [Commented] (YARN-628) Fix YarnException unwrapping

2013-05-15 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659251#comment-13659251
 ] 

Vinod Kumar Vavilapalli commented on YARN-628:
--

This looks perfect. Will run tests and check this in.

> Fix YarnException unwrapping
> 
>
> Key: YARN-628
> URL: https://issues.apache.org/jira/browse/YARN-628
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: YARN-628.txt, YARN-628.txt, YARN-628.txt, YARN-628.txt.2
>
>
> Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, 
> RPCUtil post YARN-625) is broken, and often ends up throwin 
> UndeclaredThrowableException. This needs to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-628) Fix YarnException unwrapping

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659225#comment-13659225
 ] 

Hadoop QA commented on YARN-628:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583427/YARN-628.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/940//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/940//console

This message is automatically generated.

> Fix YarnException unwrapping
> 
>
> Key: YARN-628
> URL: https://issues.apache.org/jira/browse/YARN-628
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: YARN-628.txt, YARN-628.txt, YARN-628.txt, YARN-628.txt.2
>
>
> Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, 
> RPCUtil post YARN-625) is broken, and often ends up throwin 
> UndeclaredThrowableException. This needs to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-688) Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659227#comment-13659227
 ] 

Hadoop QA commented on YARN-688:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583416/YARN-688.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/941//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/941//console

This message is automatically generated.

> Containers not cleaned up when NM received SHUTDOWN event from 
> NodeStatusUpdater
> 
>
> Key: YARN-688
> URL: https://issues.apache.org/jira/browse/YARN-688
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-688.1.patch
>
>
> Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers 
> event happens to be on the same dispatcher thread, CleanupContainers Event 
> will not be processed until SHUTDOWN event is processed. see similar problem 
> on YARN-495.
> On normal NM shutdown, this is not a problem since normal stop happens on 
> shutdownHook thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-628) Fix YarnException unwrapping

2013-05-15 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-628:


Attachment: YARN-628.txt

Thanks! This patch did need an exhaustive review.

bq. TestClientRMTokens: Can we explicitly test for InvalidToken? That'll be 
great if possible.
Sure. Wasn't really trying to get all the exception verification in tests 
cleaned up in this patch. There's more in MR tests; I'll open a separate jira 
for this.
bq. TestClientTokens: The exception thrown should always be RemoteException, so 
no need for the if condition, we should simply assert so.
Done.
bq. RPCUtil.instantiateException -> instantiateRemoteException?
I've left this as is. It's not instantiating a remote exception. Maybe 
instantiateFromRemoteException, but I prefer the current name.

Have renamed some of the other methods, some as suggested, others with slightly 
clearer names, and have added some comments to make the tests easier to 
understand.

Also, added documentation to YarnRemoteException stating that derived classes 
must include a String only constructor.

> Fix YarnException unwrapping
> 
>
> Key: YARN-628
> URL: https://issues.apache.org/jira/browse/YARN-628
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: YARN-628.txt, YARN-628.txt, YARN-628.txt, YARN-628.txt.2
>
>
> Unwrapping of YarnRemoteExceptions (currently in YarnRemoteExceptionPBImpl, 
> RPCUtil post YARN-625) is broken, and often ends up throwin 
> UndeclaredThrowableException. This needs to be fixed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659136#comment-13659136
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583405/YARN-117-008.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 38 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.mapreduce.v2.app.TestStagingCleanup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/939//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/939//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/939//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/939//console

This message is automatically generated.

> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117-007.patch, YARN-117-008.patch, 
> YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, 
> YARN-117.6.patch, YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> cla

[jira] [Updated] (YARN-688) Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater

2013-05-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-688:
-

Attachment: YARN-688.1.patch

This patch basically creates a new thread on handling shutdown event from 
nodeStatusUpdater

> Containers not cleaned up when NM received SHUTDOWN event from 
> NodeStatusUpdater
> 
>
> Key: YARN-688
> URL: https://issues.apache.org/jira/browse/YARN-688
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-688.1.patch
>
>
> Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers 
> event happens to be on the same dispatcher thread, CleanupContainers Event 
> will not be processed until SHUTDOWN event is processed. see similar problem 
> on YARN-495.
> On normal NM shutdown, this is not a problem since normal stop happens on 
> shutdownHook thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-688) Containers not cleaned up when NM received SHUTDOWN event from NodeStatusUpdater

2013-05-15 Thread Jian He (JIRA)
Jian He created YARN-688:


 Summary: Containers not cleaned up when NM received SHUTDOWN event 
from NodeStatusUpdater
 Key: YARN-688
 URL: https://issues.apache.org/jira/browse/YARN-688
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He


Currently, both SHUTDOWN event from nodeStatusUpdater and CleanupContainers 
event happens to be on the same dispatcher thread, CleanupContainers Event will 
not be processed until SHUTDOWN event is processed. see similar problem on 
YARN-495.
On normal NM shutdown, this is not a problem since normal stop happens on 
shutdownHook thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659091#comment-13659091
 ] 

Carlo Curino commented on YARN-624:
---

Alejandro, I completely agree gang scheduling is an important and missing use 
case.  As I told you in person, I spoke with various machine-learning guys and 
they are very interested in gang scheduling (they are working on their own AM 
for ML computations). From the conversation I am convinced their asks represent 
a rather common requirement for much of ML-type applications. In particular, 
they were interested in the "or" use-case you mentioned. 

Specifically they want to be able to express this:
1) 1 container with 128GB of RAM and 16cores OR
2) 10 containers with 16GB of RAM and 2 cores OR 
3) 100 containers with 2GB of RAM and 1 core

In term of locality I can see three main scenarios:
1) absolute locality, i.e., I need a gang of N containers on this rack, or on 
these set of nodes, 
2) relative locality, i.e., I need a gang of N containers "close to each other" 
(this really captures more of a network property than anything else)
3) (no locality), i.e., I need a gang of N containers anywhere in the cluster
  

> Support gang scheduling in the AM RM protocol
> -
>
> Key: YARN-624
> URL: https://issues.apache.org/jira/browse/YARN-624
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, scheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
> scheduler runs a set of tasks when they can all be run at the same time, 
> would be a useful feature for YARN schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they 
> get all the ones they need.  However, this lends itself to deadlocks when 
> different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-117) Enhance YARN service model

2013-05-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117-008.patch

build diff from root of repository, so patch can apply it

> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117-007.patch, YARN-117-008.patch, 
> YARN-117-2.patch, YARN-117-3.patch, YARN-117.4.patch, YARN-117.5.patch, 
> YARN-117.6.patch, YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> class -yet subclasses tend to call this *after* their own {{init()}}, 
> {{start()}} & {{stop()}} operations. This means that these operations can be 
> performed out of order, and even if the outcome of the call is an exception, 
> all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
> demonstrates this.
> This is a tricky one to address. In HADOOP-3128 I used a base class instead 
> of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods 
> {{final}}. These methods would do the checks, and then invoke protected inner 
> methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
> retrofit the same behaviour to everything that extends {{AbstractService}} 
> -something that must be done before the class is considered stable (because 
> once the lifecycle methods are declared final, all subclasses that are out of 
> the source tree will need fixing by the respective developers.
> h2. AbstractService state change doesn't defend against race conditions.
> There's no concurrency locks on the state transitions. Whatever fix for wrong 
> state calls is added should correct this to prevent re-entrancy, such as 
> {{stop()}} being called from two threads.
> h2.  Static methods to choreograph of lifecycle operations
> Helper methods to move things through lifecycles. init->start is common, 
> stop-if-service!=null another. Some static methods can execute these, and 
> even call {{stop()}} if {{init()}} raises an exception. These could go into a 
> class {{ServiceOps}} in the same package. These can be used by those services 
> that wrap other services, and help manage more robust shutdowns.
> h2. state transition failures are something that registered service listeners 
> may wish to be informed of.
> When a state transition fails a {{RuntimeException}} can be thrown -and the 
> service listeners are not informed as the notification point isn't reached. 
> They may wish to know this, especially for management and diagnostics.
> *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
> {{stateChangeFailed(Service service,Service.State targeted-state, 
> RuntimeException e)}} that is invoked from the (final) state change methods 
> in the {{AbstractService}} class (once they delegate to their inner 
> {{innerStart()}}, {{innerStop()}} methods; make a no-op on the existing 
> implementations of the interface.
> h2. Service listener failures not handled
> Is this an error an error or not? Log and ignore may not be what is desired.
> *Proposed:* during {{stop()}} any exception by a listener is caught and 
> discarde

[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN

2013-05-15 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13659024#comment-13659024
 ] 

Carlo Curino commented on YARN-666:
---

Hi Vinod, I will give you some numbers but bare in mind that these results are 
very initial, based only on a handful of runs on a 9 or 10 machine cluster, and 
without serious tuning of terasort. 

The idea of the solution is for maps to write their output directly into HDFS 
(e.g., with replication turned down to 1). Reducers will be started only when 
maps complete and stream-merge straight out of HDFS (bypassing much of the 
partial merging logic). 

Key limitations of what we have for now:
1) if a map output is lost, all reducers will have to wait for it to be re-run
2) we have lots of dfsclients open, this might become a problem for HDFS if you 
have too many maps per node. 

We initially tried this as a way to make checkpointing cheaper (no need to save 
any state other than last-processed key), and we were just hoping for it not 
too be too much worse than regular shuffle. The surprise I mentioned above was 
that we actually observe a surprisingly substantial speed up on a simple sort 
job (on 9 nodes): 25% at 64GB scale and 31% at 1TB scale. 

This seems to indicate that the penalty of reading through HDFS is actually 
trumped by the benefits of doing a stream-merge (where data never touch disk on 
the reduce side, other than for reducer output). Probably this is reducing 
seeks, and using the drives from which we read and we write more efficiently. 
You can imagine to get similar benefits by adding restartability to the http 
client (and the buffering done by HDFS client, which was likely to be 
beneficial in our test). More sophisticated versions of these could also 
dynamically decide whether to stream merge from a certain map or whether to 
copy the data (if for example they are small to fit in memory). 

Bottomline, I don't think we should read to much out these results (again very 
initial), other than using HDFS for intermediate data layer is not completely 
infeasible. 


> [Umbrella] Support rolling upgrades in YARN
> ---
>
> Key: YARN-666
> URL: https://issues.apache.org/jira/browse/YARN-666
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
> Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf
>
>
> Jira to track changes required in YARN to allow rolling upgrades, including 
> documentation and possible upgrade routes. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658997#comment-13658997
 ] 

Hadoop QA commented on YARN-530:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583393/YARN-530-005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/937//console

This message is automatically generated.

> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -
>
> Key: YARN-530
> URL: https://issues.apache.org/jira/browse/YARN-530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117changes.pdf, YARN-530-005.patch, 
> YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch
>
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658993#comment-13658993
 ] 

Hadoop QA commented on YARN-117:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583395/YARN-117-007.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/938//console

This message is automatically generated.

> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117-007.patch, YARN-117-2.patch, YARN-117-3.patch, 
> YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> class -yet subclasses tend to call this *after* their own {{init()}}, 
> {{start()}} & {{stop()}} operations. This means that these operations can be 
> performed out of order, and even if the outcome of the call is an exception, 
> all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
> demonstrates this.
> This is a tricky one to address. In HADOOP-3128 I used a base class instead 
> of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods 
> {{final}}. These methods would do the checks, and then invoke protected inner 
> methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
> retrofit the same behaviour to everything that extends {{AbstractService}} 
> -something that must be done before the class is considered stable (because 
> once the lifecycle methods are declared final, all subclasses that are out of 
> the source tree will need fixing by the respective developers.
> h2. AbstractService state change doesn't defend against race conditions.
> There's no concurrency locks on the state transitions. Whatever fix for wrong 
> state calls is added should correct this to prevent re-entrancy, such as 
> {{stop()}} being called from two threads.
> h2.  Static methods to choreograph of lifecycle operations
> Helper methods to move things through lifecycles. init->start is common, 
> stop-if-service!=null another. Some static methods can execute these, and 
> even call {{stop()}} if {{init()}} raises an exception. These could go into a 
> class {{ServiceOps}} in the same package. These can be used by those services 
> that wrap other services, and help manage more robust shutdowns.
> h2. state transition failures are something that registered service listeners 
> may wish to be informed of.
> When a state transition fails a {{RuntimeException}} can be thrown -and the 
> service listeners are not informed as the notification point isn't reached. 
> They may wish to know this, especially for management and diagnostics.
> *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
> {{stateChangeFailed(Service service,Service.State targeted-state, 
> RuntimeException e)}} that is invoked from the (final) state change methods 
> in the {{AbstractService}} class (once they delegate to their in

[jira] [Commented] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658989#comment-13658989
 ] 

Hadoop QA commented on YARN-638:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583374/YARN-638.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/934//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/934//console

This message is automatically generated.

> Restore RMDelegationTokens after RM Restart
> ---
>
> Key: YARN-638
> URL: https://issues.apache.org/jira/browse/YARN-638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-638.10.patch, YARN-638.11.patch, YARN-638.1.patch, 
> YARN-638.2.patch, YARN-638.3.patch, YARN-638.4.patch, YARN-638.5.patch, 
> YARN-638.6.patch, YARN-638.7.patch, YARN-638.8.patch, YARN-638.9.patch
>
>
> This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
> added both in DelegationTokenRenewer (addressed in YARN-581), and 
> delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-117) Enhance YARN service model

2013-05-15 Thread Milind Bhandarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658977#comment-13658977
 ] 

Milind Bhandarkar commented on YARN-117:


Hi,

This is an automated message. Please do not reply to this email.
If you are receiving this email, it must be because you sent an email to my old 
email address @EMC.com. Currently, all email sent to this address is being 
forwarded to my new email address:

mbhandar...@gopivotal.com

However, this forwarding will stop soon, and I will not be able to receive 
email sent to @EMC.com address. Please update your contacts DB with my new 
email address. Thank you.

- milind



> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117-007.patch, YARN-117-2.patch, YARN-117-3.patch, 
> YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> class -yet subclasses tend to call this *after* their own {{init()}}, 
> {{start()}} & {{stop()}} operations. This means that these operations can be 
> performed out of order, and even if the outcome of the call is an exception, 
> all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
> demonstrates this.
> This is a tricky one to address. In HADOOP-3128 I used a base class instead 
> of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods 
> {{final}}. These methods would do the checks, and then invoke protected inner 
> methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
> retrofit the same behaviour to everything that extends {{AbstractService}} 
> -something that must be done before the class is considered stable (because 
> once the lifecycle methods are declared final, all subclasses that are out of 
> the source tree will need fixing by the respective developers.
> h2. AbstractService state change doesn't defend against race conditions.
> There's no concurrency locks on the state transitions. Whatever fix for wrong 
> state calls is added should correct this to prevent re-entrancy, such as 
> {{stop()}} being called from two threads.
> h2.  Static methods to choreograph of lifecycle operations
> Helper methods to move things through lifecycles. init->start is common, 
> stop-if-service!=null another. Some static methods can execute these, and 
> even call {{stop()}} if {{init()}} raises an exception. These could go into a 
> class {{ServiceOps}} in the same package. These can be used by those services 
> that wrap other services, and help manage more robust shutdowns.
> h2. state transition failures are something that registered service listeners 
> may wish to be informed of.
> When a state transition fails a {{RuntimeException}} can be thrown -and the 
> service listeners are not informed as the notification point isn't reached. 
> They may wish to know this, especially for management and diagnostics.
> *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
> {{stateChangeFailed(Service service,Service.State targeted-state, 
> RuntimeException e)}} that is invoked fro

[jira] [Updated] (YARN-117) Enhance YARN service model

2013-05-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-117:


Attachment: YARN-117-007.patch

patch in sync with with {{YARN-530-005.patch}}

#adapts to the new {{serviceStart()}}, {{serviceStop()}}, serviceInit()}} 
names.
# {{NodeManager}} shutdown is hardened to work from INITED
# {{NodeStatusUpdater}} cross-thread stop flag marked as {{volatile}}
# Various tests more rigorous about stopping services on failure/exit

> Enhance YARN service model
> --
>
> Key: YARN-117
> URL: https://issues.apache.org/jira/browse/YARN-117
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.0.4-alpha
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117-007.patch, YARN-117-2.patch, YARN-117-3.patch, 
> YARN-117.4.patch, YARN-117.5.patch, YARN-117.6.patch, YARN-117.patch
>
>
> Having played the YARN service model, there are some issues
> that I've identified based on past work and initial use.
> This JIRA issue is an overall one to cover the issues, with solutions pushed 
> out to separate JIRAs.
> h2. state model prevents stopped state being entered if you could not 
> successfully start the service.
> In the current lifecycle you cannot stop a service unless it was successfully 
> started, but
> * {{init()}} may acquire resources that need to be explicitly released
> * if the {{start()}} operation fails partway through, the {{stop()}} 
> operation may be needed to release resources.
> *Fix:* make {{stop()}} a valid state transition from all states and require 
> the implementations to be able to stop safely without requiring all fields to 
> be non null.
> Before anyone points out that the {{stop()}} operations assume that all 
> fields are valid; and if called before a {{start()}} they will NPE; 
> MAPREDUCE-3431 shows that this problem arises today, MAPREDUCE-3502 is a fix 
> for this. It is independent of the rest of the issues in this doc but it will 
> aid making {{stop()}} execute from all states other than "stopped".
> MAPREDUCE-3502 is too big a patch and needs to be broken down for easier 
> review and take up; this can be done with issues linked to this one.
> h2. AbstractService doesn't prevent duplicate state change requests.
> The {{ensureState()}} checks to verify whether or not a state transition is 
> allowed from the current state are performed in the base {{AbstractService}} 
> class -yet subclasses tend to call this *after* their own {{init()}}, 
> {{start()}} & {{stop()}} operations. This means that these operations can be 
> performed out of order, and even if the outcome of the call is an exception, 
> all actions performed by the subclasses will have taken place. MAPREDUCE-3877 
> demonstrates this.
> This is a tricky one to address. In HADOOP-3128 I used a base class instead 
> of an interface and made the {{init()}}, {{start()}} & {{stop()}} methods 
> {{final}}. These methods would do the checks, and then invoke protected inner 
> methods, {{innerStart()}}, {{innerStop()}}, etc. It should be possible to 
> retrofit the same behaviour to everything that extends {{AbstractService}} 
> -something that must be done before the class is considered stable (because 
> once the lifecycle methods are declared final, all subclasses that are out of 
> the source tree will need fixing by the respective developers.
> h2. AbstractService state change doesn't defend against race conditions.
> There's no concurrency locks on the state transitions. Whatever fix for wrong 
> state calls is added should correct this to prevent re-entrancy, such as 
> {{stop()}} being called from two threads.
> h2.  Static methods to choreograph of lifecycle operations
> Helper methods to move things through lifecycles. init->start is common, 
> stop-if-service!=null another. Some static methods can execute these, and 
> even call {{stop()}} if {{init()}} raises an exception. These could go into a 
> class {{ServiceOps}} in the same package. These can be used by those services 
> that wrap other services, and help manage more robust shutdowns.
> h2. state transition failures are something that registered service listeners 
> may wish to be informed of.
> When a state transition fails a {{RuntimeException}} can be thrown -and the 
> service listeners are not informed as the notification point isn't reached. 
> They may wish to know this, especially for management and diagnostics.
> *Fix:* extend {{ServiceStateChangeListener}} with a callback such as 
> {{stateChangeFailed(Service service,Service.State targeted-state, 
> RuntimeException e)}} that is invoked from the (final) state change methods 
> in the {{AbstractService}} class (once they delegate to their inner 
> {{innerStart()}}, {{innerStop()}} methods; make a no-op on the ex

[jira] [Commented] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-05-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658961#comment-13658961
 ] 

Steve Loughran commented on YARN-530:
-

h3. Service

bq. {{start()}} doesn't use {{enterState()}} API, so we don't capture the 
life-cycle change events.

-fixed

bq. {{init()}}, {{start()}} and {{stop()}} aren't synchronized, so callers of 
{{getServiceState()}} will get incorrect information if, let's say, 
{{innerInit}} is still in progress.

This is interesting. I've realised they need to be sync or you add an 
"stateChangeInProgress" marker to ensure you eliminate the race of someone 
trying to {{stop()}} a service half-way through {{start()}}. Some of the more 
complex state models make the 'starting', 'stopping' states explicit, which is 
another option. 

* I'm going to make the methods {{synchronized}} to stop the risk of any race 
conditions -while doing a best effort at keeping the notifications outside the 
{{synchronized}} block. The corner case here is that when an init or start 
fails and {{stop()}} is called automatically: it will notify its listeners 
inside the {{stop()}} call.

* I'm going to keep the state queries unsynced (reading a volatile), so that 
doesn't stop things that only want to read and not manipulate service state 
from blocking.

bq. Rename {{inState}} to {{isInState}}?

done

bq. {{waitForServiceToStop}} seems redundant because we also have listeners, 
right? Sure one is a blocking call while the other is async. I'd remove it 
unless there is some other strong reason. May be we can implement an async 
utility using{{ getServiceState()}} and implement a generic 
{{waitForServiceState}} instead of just for stop?

-let me add something that isn't on the critical path for the next alpha, as it 
isn't an API change: an entry point to start a service by its name.
I've just added YARN-679 to show the use case here: an entry point to start any 
service by its classname. That isn't ready to be committed (where are the 
tests!), but it shows the vision. I'll see if I can implement it with the 
notifications.

bq. What is the use of 'blockers'?

An attempt to make the reason a service blocks visible, at least when it is 
consciously blocking (e.g. spin/sleep waiting for manager node). Unconscious 
blocking, by way of blocking API calls, will still happen. If the service can 
declare that they are blocked by something, then other tooling can say "what is 
this service waiting for"

bq. May be LifecycleEvent move to top-level?
-done

bq. Not part of your patch, but we may as well take this opportunity to fix 
this: Rename register -> registerServiceListener? Similarly unregister.
-done

h3. AbstractService

bq. The inner* method-names don't look good when using the service stuff. Shall 
we rename it to to say, for e.g, innerInit to initService ?

bq. Mark all the super init, start, stop methods as final?

Gladly, though it does cause a mockito test to fail:

{code}
testResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService)
  Time elapsed: 247 sec  <<< FAILURE!
java.lang.AssertionError: null state in null class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker$$EnhancerByMockitoWithCGLIB$$221b8e7
at 
org.apache.hadoop.yarn.service.AbstractService.enterState(AbstractService.java:431)
at 
org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:151)
at 
org.apache.hadoop.yarn.service.CompositeService.serviceInit(CompositeService.java:67)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceInit(ResourceLocalizationService.java:240)
at 
org.apache.hadoop.yarn.service.AbstractService.init(AbstractService.java:154)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testResourceRelease(TestResourceLocalizationService.java:239)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassR

[jira] [Updated] (YARN-530) Define Service model strictly, implement AbstractService for robust subclassing, migrate yarn-common services

2013-05-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-530:


Attachment: YARN-530-005.patch

> Define Service model strictly, implement AbstractService for robust 
> subclassing, migrate yarn-common services
> -
>
> Key: YARN-530
> URL: https://issues.apache.org/jira/browse/YARN-530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: YARN-117changes.pdf, YARN-530-005.patch, 
> YARN-530-2.patch, YARN-530-3.patch, YARN-530.4.patch, YARN-530.patch
>
>
> # Extend the YARN {{Service}} interface as discussed in YARN-117
> # Implement the changes in {{AbstractService}} and {{FilterService}}.
> # Migrate all services in yarn-common to the more robust service model, test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658867#comment-13658867
 ] 

Hadoop QA commented on YARN-366:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583378/YARN-366-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/936//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/936//console

This message is automatically generated.

> Add a tracing async dispatcher to simplify debugging
> 
>
> Key: YARN-366
> URL: https://issues.apache.org/jira/browse/YARN-366
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
> YARN-366-4.patch, YARN-366.patch
>
>
> Exceptions thrown in YARN/MR code with asynchronous event handling do not 
> contain informative stack traces, as all handle() methods sit directly under 
> the dispatcher thread's loop.
> This makes errors very difficult to debug for those who are not intimately 
> familiar with the code, as it is difficult to see which chain of events 
> caused a particular outcome.
> I propose adding an AsyncDispatcher that instruments events with tracing 
> information.  Whenever an event is dispatched during the handling of another 
> event, the dispatcher would annotate that event with a pointer to its parent. 
>  When the dispatcher catches an exception, it could reconstruct a "stack" 
> trace of the chain of events that led to it, and be able to log something 
> informative.
> This would be an experimental feature, off by default, unless extensive 
> testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658858#comment-13658858
 ] 

Hadoop QA commented on YARN-617:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12583375/YARN-617-20130515.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 22 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/935//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/935//console

This message is automatically generated.

> In unsercure mode, AM can fake resource requirements 
> -
>
> Key: YARN-617
> URL: https://issues.apache.org/jira/browse/YARN-617
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
>Priority: Minor
> Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch, 
> YARN-617.20130502.patch, YARN-617-20130507.patch, YARN-617.20130508.patch, 
> YARN-617-20130513.patch, YARN-617-20130515.patch
>
>
> Without security, it is impossible to completely avoid AMs faking resources. 
> We can at the least make it as difficult as possible by using the same 
> container tokens and the RM-NM shared key mechanism over unauthenticated 
> RM-NM channel.
> In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-687) TestNMAuditLogger hang

2013-05-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658843#comment-13658843
 ] 

Steve Loughran commented on YARN-687:
-

thread dump 
{code}
Running org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger
2013-05-15 21:21:39
Full thread dump OpenJDK 64-Bit Server VM (20.0-b12 mixed mode):
"IPC Server handler 4 on 32868" daemon prio=10 tid=0x7fc9ec4da000 
nid=0x359e waiting on condition [0x7fc9e8af9000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xebaf1720> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1817)
"IPC Server handler 3 on 32868" daemon prio=10 tid=0x7fc9ec4b8000 
nid=0x359d waiting on condition [0x7fc9e8bfa000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xebaf1720> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1817)
"IPC Server handler 2 on 32868" daemon prio=10 tid=0x7fc9ec4b7000 
nid=0x359c waiting on condition [0x7fc9e8cfb000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xebaf1720> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1817)
"IPC Server handler 1 on 32868" daemon prio=10 tid=0x7fc9ec41b800 
nid=0x359b waiting on condition [0x7fc9e8dfc000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xebaf1720> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1817)
"IPC Server handler 0 on 32868" daemon prio=10 tid=0x7fc9ec414000 
nid=0x359a waiting on condition [0x7fc9e8efd000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xebaf1720> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:386)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1817)
"IPC Server listener on 32868" daemon prio=10 tid=0x7fc9ec3eb000 nid=0x3599 
runnable [0x7fc9e8ffe000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0xebaf2800> (a sun.nio.ch.Util$1)
- locked <0xebaf27f0> (a java.util.Collections$UnmodifiableSet)
- locked <0xebaf2380> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
at org.apache.hadoop.ipc.Server$Listener.run(Server.java:678)
"IPC Server Responder" daemon prio=10 tid=0x7fc9ec4bb000 nid=0x3598 
runnable [0x7fc9f0172000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

[jira] [Created] (YARN-687) TestNMAuditLogger hang

2013-05-15 Thread Steve Loughran (JIRA)
Steve Loughran created YARN-687:
---

 Summary: TestNMAuditLogger hang
 Key: YARN-687
 URL: https://issues.apache.org/jira/browse/YARN-687
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0
 Environment: Linux stevel-dev 3.2.0-24-virtual #39-Ubuntu SMP Mon May 
21 18:44:18 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
Reporter: Steve Loughran
Priority: Minor


TestNMAuditLogger hanging repeatedly on a test VM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-656) In scheduler UI, including reserved memory in "Memory Total" can make it exceed cluster capacity.

2013-05-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-656:


Summary: In scheduler UI, including reserved memory in "Memory Total" can 
make it exceed cluster capacity.  (was: In scheduler UI, including reserved 
memory "Memory Total" can make it exceed cluster capacity.)

> In scheduler UI, including reserved memory in "Memory Total" can make it 
> exceed cluster capacity.
> -
>
> Key: YARN-656
> URL: https://issues.apache.org/jira/browse/YARN-656
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> "Memory Total" is currently a sum of availableMB, allocatedMB, and 
> reservedMB.  Including reservedMB in this sum can make the total exceed the 
> capacity of the cluster. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-05-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-366:


Attachment: YARN-366-4.patch

> Add a tracing async dispatcher to simplify debugging
> 
>
> Key: YARN-366
> URL: https://issues.apache.org/jira/browse/YARN-366
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
> YARN-366-4.patch, YARN-366.patch
>
>
> Exceptions thrown in YARN/MR code with asynchronous event handling do not 
> contain informative stack traces, as all handle() methods sit directly under 
> the dispatcher thread's loop.
> This makes errors very difficult to debug for those who are not intimately 
> familiar with the code, as it is difficult to see which chain of events 
> caused a particular outcome.
> I propose adding an AsyncDispatcher that instruments events with tracing 
> information.  Whenever an event is dispatched during the handling of another 
> event, the dispatcher would annotate that event with a pointer to its parent. 
>  When the dispatcher catches an exception, it could reconstruct a "stack" 
> trace of the chain of events that led to it, and be able to log something 
> informative.
> This would be an experimental feature, off by default, unless extensive 
> testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-05-15 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658836#comment-13658836
 ] 

Sandy Ryza commented on YARN-366:
-

Uploading a patch to address the findbugs warnings.

> Add a tracing async dispatcher to simplify debugging
> 
>
> Key: YARN-366
> URL: https://issues.apache.org/jira/browse/YARN-366
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
> YARN-366-4.patch, YARN-366.patch
>
>
> Exceptions thrown in YARN/MR code with asynchronous event handling do not 
> contain informative stack traces, as all handle() methods sit directly under 
> the dispatcher thread's loop.
> This makes errors very difficult to debug for those who are not intimately 
> familiar with the code, as it is difficult to see which chain of events 
> caused a particular outcome.
> I propose adding an AsyncDispatcher that instruments events with tracing 
> information.  Whenever an event is dispatched during the handling of another 
> event, the dispatcher would annotate that event with a pointer to its parent. 
>  When the dispatcher catches an exception, it could reconstruct a "stack" 
> trace of the chain of events that led to it, and be able to log something 
> informative.
> This would be an experimental feature, off by default, unless extensive 
> testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658816#comment-13658816
 ] 

Bikas Saha commented on YARN-613:
-

To be clear, on the AM the behavior is always to take the tokens coming in the 
allocate response and setting them in the UGI (overriding old values). They 
will be picked from the UGI by NMClient during communication.
The behavior on the NM will be to always authenticate based on the current 
master key. This is always the latest correct value and in the majority of the 
use cases, this master key will be identical to the cached appId-MasterKey. If 
the master key matches the incoming token then the master key is used as the 
new value of the cached appId-master-key. If the master key fails to validate 
the token (long running apps), then the appId-master-key is used to validate 
the token.

It would be great to take the solution and break the work into separate jiras. 
e.g. AMRMProtocol addition, NMRM protocol changes, RM server changes, NM server 
changes, AMRMClient changes, nmclient changes.

bq. If we don't need to preserve the work, (AM and container will be killed 
after RM restarts) then there will be no problem at all even with above 
implementation in which case as applications are already killed so we can just 
clear the cache on NM.
If this cache is per appId then it cannot be removed when the appAttempt is 
completes. It will be removed when the application completes. During NM resync 
we should not invalidate the cache. The cache is required for work preserving 
restart and will automatically be refreshed by the above logic for 
non-work-preserving restart.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-617) In unsercure mode, AM can fake resource requirements

2013-05-15 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-617:
---

Attachment: YARN-617-20130515.patch

> In unsercure mode, AM can fake resource requirements 
> -
>
> Key: YARN-617
> URL: https://issues.apache.org/jira/browse/YARN-617
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
>Priority: Minor
> Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch, 
> YARN-617.20130502.patch, YARN-617-20130507.patch, YARN-617.20130508.patch, 
> YARN-617-20130513.patch, YARN-617-20130515.patch
>
>
> Without security, it is impossible to completely avoid AMs faking resources. 
> We can at the least make it as difficult as possible by using the same 
> container tokens and the RM-NM shared key mechanism over unauthenticated 
> RM-NM channel.
> In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-617) In unsercure mode, AM can fake resource requirements

2013-05-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658813#comment-13658813
 ] 

Omkar Vinit Joshi commented on YARN-617:


Thanks vinod..

bq. ContainerManager.getContainerTokenIdentifier should be changed to throw 
only YarnRemoteException, we only throw that at the YARN layer
Fixed ..using RPCUtil.getRemoteException

bq. I still don't understand the DEL changes in TestNodeManagerReboot. You 
haven't given any explanation. Don't think they are needed.
My bad... I had reverted the change ..but there were formatting issues which 
showed up in diff.. Fixed..


> In unsercure mode, AM can fake resource requirements 
> -
>
> Key: YARN-617
> URL: https://issues.apache.org/jira/browse/YARN-617
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Omkar Vinit Joshi
>Priority: Minor
> Attachments: YARN-617.20130501.1.patch, YARN-617.20130501.patch, 
> YARN-617.20130502.patch, YARN-617-20130507.patch, YARN-617.20130508.patch, 
> YARN-617-20130513.patch, YARN-617-20130515.patch
>
>
> Without security, it is impossible to completely avoid AMs faking resources. 
> We can at the least make it as difficult as possible by using the same 
> container tokens and the RM-NM shared key mechanism over unauthenticated 
> RM-NM channel.
> In the minimum, this will avoid accidental bugs in AMs in unsecure mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-638:
-

Attachment: YARN-638.11.patch

The newest patch revert all hdfs changes except moving addPersistedToken method 
to the common-secret-mananger

> Restore RMDelegationTokens after RM Restart
> ---
>
> Key: YARN-638
> URL: https://issues.apache.org/jira/browse/YARN-638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-638.10.patch, YARN-638.11.patch, YARN-638.1.patch, 
> YARN-638.2.patch, YARN-638.3.patch, YARN-638.4.patch, YARN-638.5.patch, 
> YARN-638.6.patch, YARN-638.7.patch, YARN-638.8.patch, YARN-638.9.patch
>
>
> This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
> added both in DelegationTokenRenewer (addressed in YARN-581), and 
> delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-662) Enforce required parameters for all the protocols

2013-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658795#comment-13658795
 ] 

Bikas Saha commented on YARN-662:
-

Does this include making null checks etc on all incoming fields in the API 
handlers? Currently in most places we simply access the fields assuming they 
will be valid.

> Enforce required parameters for all the protocols
> -
>
> Key: YARN-662
> URL: https://issues.apache.org/jira/browse/YARN-662
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Siddharth Seth
>Assignee: Zhijie Shen
>
> All proto fields are marked as options. We need to mark some of them as 
> requried, or enforce these server side. Server side is likely better since 
> that's more flexible (Example deprecating a field type in favour of another - 
> either of the two must be present)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (YARN-686) Flatten NodeReport

2013-05-15 Thread Sandy Ryza (JIRA)
Sandy Ryza created YARN-686:
---

 Summary: Flatten NodeReport
 Key: YARN-686
 URL: https://issues.apache.org/jira/browse/YARN-686
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza


The NodeReport returned by getClusterNodes or given to AMs in heartbeat 
responses includes both a NodeState (enum) and a NodeHealthStatus (object).  As 
UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem 
necessary.  I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving
moving the its two other methods, getHealthReport and getLastHealthReportTime, 
into NodeReport

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-686) Flatten NodeReport

2013-05-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-686:


Description: The NodeReport returned by getClusterNodes or given to AMs in 
heartbeat responses includes both a NodeState (enum) and a NodeHealthStatus 
(object).  As UNHEALTHY is already NodeState, a separate NodeHealthStatus 
doesn't seem necessary.  I propose eliminating 
NodeHealthStatus#getIsNodeHealthy and moving its two other methods, 
getHealthReport and getLastHealthReportTime, into NodeReport.  (was: The 
NodeReport returned by getClusterNodes or given to AMs in heartbeat responses 
includes both a NodeState (enum) and a NodeHealthStatus (object).  As UNHEALTHY 
is already NodeState, a separate NodeHealthStatus doesn't seem necessary.  I 
propose eliminating NodeHealthStatus#getIsNodeHealthy and moving
moving the its two other methods, getHealthReport and getLastHealthReportTime, 
into NodeReport.)

> Flatten NodeReport
> --
>
> Key: YARN-686
> URL: https://issues.apache.org/jira/browse/YARN-686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> The NodeReport returned by getClusterNodes or given to AMs in heartbeat 
> responses includes both a NodeState (enum) and a NodeHealthStatus (object).  
> As UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem 
> necessary.  I propose eliminating NodeHealthStatus#getIsNodeHealthy and 
> moving its two other methods, getHealthReport and getLastHealthReportTime, 
> into NodeReport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-686) Flatten NodeReport

2013-05-15 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-686:


Description: 
The NodeReport returned by getClusterNodes or given to AMs in heartbeat 
responses includes both a NodeState (enum) and a NodeHealthStatus (object).  As 
UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem 
necessary.  I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving
moving the its two other methods, getHealthReport and getLastHealthReportTime, 
into NodeReport.

  was:
The NodeReport returned by getClusterNodes or given to AMs in heartbeat 
responses includes both a NodeState (enum) and a NodeHealthStatus (object).  As 
UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem 
necessary.  I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving
moving the its two other methods, getHealthReport and getLastHealthReportTime, 
into NodeReport


> Flatten NodeReport
> --
>
> Key: YARN-686
> URL: https://issues.apache.org/jira/browse/YARN-686
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> The NodeReport returned by getClusterNodes or given to AMs in heartbeat 
> responses includes both a NodeState (enum) and a NodeHealthStatus (object).  
> As UNHEALTHY is already NodeState, a separate NodeHealthStatus doesn't seem 
> necessary.  I propose eliminating NodeHealthStatus#getIsNodeHealthy and moving
> moving the its two other methods, getHealthReport and 
> getLastHealthReportTime, into NodeReport.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658763#comment-13658763
 ] 

Omkar Vinit Joshi commented on YARN-613:


One addition .. good suggestion [~bikassaha]
If RM restarts then we have two scenarios
* If we need to preserve the work, (AM and containers will continue to run) in 
which AM should be able to communicate with NM with older AMNMToken after RM 
start. So if AM gets new container on the NM after RM reboot (RM will send the 
new AMNMToken to AM considering it has no knowledge of the previous AMNMToken - 
information not persisted) then AM should replace the existing token with new 
one. Now if NM gets a different token than the older /stored one it should 
validate the current Token's master key with that of its current/previous 
master key. If this is valid then replace older Token (thereby we can even 
renew token).
* If we don't need to preserve the work, (AM and container will be killed after 
RM restarts) then there will be no problem at all even with above 
implementation in which case as applications are already killed so we can just 
clear the cache on NM.

> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658705#comment-13658705
 ] 

Hadoop QA commented on YARN-638:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583340/YARN-638.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.hdfs.TestClientReportBadBlock

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/933//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/933//console

This message is automatically generated.

> Restore RMDelegationTokens after RM Restart
> ---
>
> Key: YARN-638
> URL: https://issues.apache.org/jira/browse/YARN-638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-638.10.patch, YARN-638.1.patch, YARN-638.2.patch, 
> YARN-638.3.patch, YARN-638.4.patch, YARN-638.5.patch, YARN-638.6.patch, 
> YARN-638.7.patch, YARN-638.8.patch, YARN-638.9.patch
>
>
> This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
> added both in DelegationTokenRenewer (addressed in YARN-581), and 
> delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-624) Support gang scheduling in the AM RM protocol

2013-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658637#comment-13658637
 ] 

Alejandro Abdelnur commented on YARN-624:
-

As pointed out, supporting gang at RM/scheduler level will allow 
detection/avoidance of deadlocks. This would not be trivial (nor efficient) to 
do if gang is done at AM level.

Examples of gang request capabilities could be:

* express a set of containers in any nodes. I.e.: 10 containers in any node of 
the cluster.
* express a set of containers in a specified set of nodes. I.e.: 10 containers 
in rack1. 10 containers one in each of n1...n10
* express different sets of possible gangs that would satisfy the request: 
I.e.: 10 containers in rack1 or in rack2. 10 containers in n1...n10 or in 
n11..n20.
* indicate a timeout/fallback-to-normal of gang requests.

We should decide on what gang capabilities we want/need to address in the short 
term.


> Support gang scheduling in the AM RM protocol
> -
>
> Key: YARN-624
> URL: https://issues.apache.org/jira/browse/YARN-624
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, scheduler
>Affects Versions: 2.0.4-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> Per discussion on YARN-392 and elsewhere, gang scheduling, in which a 
> scheduler runs a set of tasks when they can all be run at the same time, 
> would be a useful feature for YARN schedulers to support.
> Currently, AMs can approximate this by holding on to containers until they 
> get all the ones they need.  However, this lends itself to deadlocks when 
> different AMs are waiting on the same containers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-613) Create NM proxy per NM instead of per container

2013-05-15 Thread Omkar Vinit Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658604#comment-13658604
 ] 

Omkar Vinit Joshi commented on YARN-613:


This was discussed offline with [~vinodkv], [~bikassaha] and [~sseth].

There were 2 viable solutions to the problem of sending AMNMToken to AM for 
authenticating with NM.

Below problems need to be addressed
* The token will be generated by RM but how long the AMNMToken should be kept 
alive? How long AM should be able to talk to NM on which it ever launched any 
container during application life cycle.
* If the token doesn't have an expiry time then who will renew the token ? NM 
or RM ?.
* If NM reboots then can the old AMNMToken be reused? ( ideally when NM goes 
down right now containers are also lost, so there is nothing specific to that 
application there in NM after reboot)
* AM might handover the AMNMToken to some other external service ( other than 
AM ...may be another container) which should be able to communicate with NM. 
(problem:- how if implemented renewal will take place?)
* We need to support for long running services.
* When key roles over there should be no spiker in communicating renewed tokens 
if implemented.

Proposed solutions :-
* No AMNMToken renewal
** here RM will generate the token and will handover to AM only if the AM is 
getting the container on underlying NM for the first time otherwise it will not 
send. AM can use this token to talk to NM as long application is alive. So this 
is upper limited by number of applications in the cluster <= number of nodes * 
number of containers per node. 
*** RM will have to remember tokens given to AM per NM
*** NM will have to remember tokens per AM
*** AM will have to anyways remember token per NM
 Problems : If NM reboots then the token is no longer valid in which case 
RM should reissue AM a new token for restarted NM
 Advantages :
* for every container RM doesn't have to generate and send token.
* no need to renew the token. No added overhead. No need to remember past 
keys (other than current and previous master key).
* even if AM hands over token to some other service, that service can keep 
using the same token.
* AMNMToken renewal
** here RM will generate and issue the token to AM during start container. RM 
also remembers which AM has what all tokens. So when key rolls over then RM 
will redistribute renewed tokens to AM for all NM on which it ever started 
container. AM if receives the updated token will have to replace older with new 
token.
*** RM will have to remember all the NMs fro which AM handed over token
*** NM doesn't have to remember tokens per application. It only has to remember 
current and previous key.
*** AM will receive AMNMToken per container request / or all tokens during key 
renewal. It will have to refresh internal tokens with it
 Advantages:
* NM doesn't need to remember the token so there will be no problem across 
NM reboot. (even though token will be valid across NM reboot but still there 
will be nothing on NM for AM before new container starts).
 Problems:
* RM has to either remember or regenerate and send tokens to AM for 
container start call. This can be avoided by just sending it when key rolls 
over.
* AM has to refresh the tokens may be given to some another service for 
monitoring container progress.
* There will be spike at key role over.


> Create NM proxy per NM instead of per container
> ---
>
> Key: YARN-613
> URL: https://issues.apache.org/jira/browse/YARN-613
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>
> Currently a new NM proxy has to be created per container since the secure 
> authentication is using a containertoken from the container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-638) Restore RMDelegationTokens after RM Restart

2013-05-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-638:
-

Attachment: YARN-638.10.patch

The newest patch adds several API for RM use to persists delegation tokens, and 
renamed logUpdateMasterKey to storeNewMasterKey to be used by both RM & hdfs, 
AND move addPersistedDelegationToken & addPersistedMasterKey to hadoop-common

> Restore RMDelegationTokens after RM Restart
> ---
>
> Key: YARN-638
> URL: https://issues.apache.org/jira/browse/YARN-638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-638.10.patch, YARN-638.1.patch, YARN-638.2.patch, 
> YARN-638.3.patch, YARN-638.4.patch, YARN-638.5.patch, YARN-638.6.patch, 
> YARN-638.7.patch, YARN-638.8.patch, YARN-638.9.patch
>
>
> This is missed in YARN-581. After RM restart, RMDelegationTokens need to be 
> added both in DelegationTokenRenewer (addressed in YARN-581), and 
> delegationTokenSecretManager

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-684) ContainerManager.startContainer needs to only have ContainerTokenIdentifier instead of the whole Container

2013-05-15 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658558#comment-13658558
 ] 

Bikas Saha commented on YARN-684:
-

I would suggest continuing to pass Container to the startContainer API of 
NMClient done in YARN-422. Internally, NMClient can pick the right fields and 
the users responsibility continues to be simply taking the Container obtained 
from AMRMClient and passing it to the NMClient. Will also future proof against 
other changes like these.

> ContainerManager.startContainer needs to only have ContainerTokenIdentifier 
> instead of the whole Container
> --
>
> Key: YARN-684
> URL: https://issues.apache.org/jira/browse/YARN-684
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> The NM only needs the token, the whole Container is unnecessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-366) Add a tracing async dispatcher to simplify debugging

2013-05-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658480#comment-13658480
 ] 

Hadoop QA commented on YARN-366:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12583270/YARN-366-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/932//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/932//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/932//console

This message is automatically generated.

> Add a tracing async dispatcher to simplify debugging
> 
>
> Key: YARN-366
> URL: https://issues.apache.org/jira/browse/YARN-366
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-366-1.patch, YARN-366-2.patch, YARN-366-3.patch, 
> YARN-366.patch
>
>
> Exceptions thrown in YARN/MR code with asynchronous event handling do not 
> contain informative stack traces, as all handle() methods sit directly under 
> the dispatcher thread's loop.
> This makes errors very difficult to debug for those who are not intimately 
> familiar with the code, as it is difficult to see which chain of events 
> caused a particular outcome.
> I propose adding an AsyncDispatcher that instruments events with tracing 
> information.  Whenever an event is dispatched during the handling of another 
> event, the dispatcher would annotate that event with a pointer to its parent. 
>  When the dispatcher catches an exception, it could reconstruct a "stack" 
> trace of the chain of events that led to it, and be able to log something 
> informative.
> This would be an experimental feature, off by default, unless extensive 
> testing showed that it did not have a significant performance impact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-379) yarn [node,application] command print logger info messages

2013-05-15 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658403#comment-13658403
 ] 

Ravi Prakash commented on YARN-379:
---

I recant my recant. So many options. We probably don't want to turn off ALL 
logging for YARN_CLIENT_OPTS either. We just want to set log level to WARN.

> yarn [node,application] command print logger info messages
> --
>
> Key: YARN-379
> URL: https://issues.apache.org/jira/browse/YARN-379
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
>Reporter: Thomas Graves
>Assignee: Abhishek Kapoor
>  Labels: usability
> Attachments: YARN-379.patch
>
>
> Running the yarn node and yarn applications command results in annoying log 
> info messages being printed:
> $ yarn node -list
> 13/02/06 02:36:50 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
> 13/02/06 02:36:50 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
> Total Nodes:1
>  Node-IdNode-State  Node-Http-Address   
> Health-Status(isNodeHealthy)Running-Containers
> foo:8041RUNNING  foo:8042   true  
>  0
> 13/02/06 02:36:50 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.
> $ yarn application
> 13/02/06 02:38:47 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
> 13/02/06 02:38:47 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
> Invalid Command Usage : 
> usage: application
>  -kill  Kills the application.
>  -list   Lists all the Applications from RM.
>  -statusPrints the status of the application.
> 13/02/06 02:38:47 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira