date:20130819

Chuan Liu created YARN-1078:
---

 Summary: TestNodeManagerResync, TestNodeManagerShutdown, and 
TestNodeStatusUpdater fail on Windows
 Key: YARN-1078
 URL: https://issues.apache.org/jira/browse/YARN-1078
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor


The three unit tests fail on Windows due to host name resolution differences on 
Windows, i.e. 127.0.0.1 does not resolve to host name localhost.

{noformat}
org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container 
container_0__01_00 identifier is not valid for current Node manager. 
Expected : 127.0.0.1:12345 Found : localhost:12345
{noformat}

{noformat}
testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater)
  Time elapsed: 8343 sec   FAILURE!
org.junit.ComparisonFailure: expected:[localhost]:12345 but 
was:[127.0.0.1]:12345
at org.junit.Assert.assertEquals(Assert.java:125)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at $Proxy26.registerNodeManager(Unknown Source)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows


 [ 
https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-1078:


Attachment: YARN-1078.patch

Attach a patch. The fixes are quite straight forward.

 TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater 
 fail on Windows
 -

 Key: YARN-1078
 URL: https://issues.apache.org/jira/browse/YARN-1078
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1078.patch


 The three unit tests fail on Windows due to host name resolution differences 
 on Windows, i.e. 127.0.0.1 does not resolve to host name localhost.
 {noformat}
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container 
 container_0__01_00 identifier is not valid for current Node manager. 
 Expected : 127.0.0.1:12345 Found : localhost:12345
 {noformat}
 {noformat}
 testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater)
   Time elapsed: 8343 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[localhost]:12345 but 
 was:[127.0.0.1]:12345
   at org.junit.Assert.assertEquals(Assert.java:125)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at $Proxy26.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows

2013-08-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743619#comment-13743619
 ] 

Hadoop QA commented on YARN-1078:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598714/YARN-1078.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1738//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1738//console

This message is automatically generated.

 TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater 
 fail on Windows
 -

 Key: YARN-1078
 URL: https://issues.apache.org/jira/browse/YARN-1078
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1078.patch


 The three unit tests fail on Windows due to host name resolution differences 
 on Windows, i.e. 127.0.0.1 does not resolve to host name localhost.
 {noformat}
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container 
 container_0__01_00 identifier is not valid for current Node manager. 
 Expected : 127.0.0.1:12345 Found : localhost:12345
 {noformat}
 {noformat}
 testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater)
   Time elapsed: 8343 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[localhost]:12345 but 
 was:[127.0.0.1]:12345
   at org.junit.Assert.assertEquals(Assert.java:125)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at $Proxy26.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition


[ 
https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743717#comment-13743717
 ] 

Hudson commented on YARN-643:
-

SUCCESS: Integrated in Hadoop-Yarn-trunk #306 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/306/])
YARN-643. Fixed ResourceManager to remove all tokens consistently on app 
finish. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515256)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 WHY appToken is removed both in BaseFinalTransition and 
 AMUnregisteredTransition AND clientToken is removed in FinalTransition and 
 not BaseFinalTransition
 --

 Key: YARN-643
 URL: https://issues.apache.org/jira/browse/YARN-643
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-643.1.patch, YARN-643.2.patch, YARN-643.3.patch, 
 YARN-643.4.patch, YARN-643.5.patch


 The jira is tracking why appToken and clientToAMToken is removed separately, 
 and why they are distributed in different transitions, ideally there may be a 
 common place where these two tokens can be removed at the same time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition


[ 
https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743795#comment-13743795
 ] 

Hudson commented on YARN-643:
-

FAILURE: Integrated in Hadoop-Hdfs-trunk #1496 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1496/])
YARN-643. Fixed ResourceManager to remove all tokens consistently on app 
finish. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515256)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 WHY appToken is removed both in BaseFinalTransition and 
 AMUnregisteredTransition AND clientToken is removed in FinalTransition and 
 not BaseFinalTransition
 --

 Key: YARN-643
 URL: https://issues.apache.org/jira/browse/YARN-643
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-643.1.patch, YARN-643.2.patch, YARN-643.3.patch, 
 YARN-643.4.patch, YARN-643.5.patch


 The jira is tracking why appToken and clientToAMToken is removed separately, 
 and why they are distributed in different transitions, ideally there may be a 
 common place where these two tokens can be removed at the same time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-19 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743819#comment-13743819
 ] 

Robert Joseph Evans commented on YARN-896:
--

[~criccomini],

That is a great point.  To do this we need the application to somehow inform 
YARN that it is a long lived application.  We could do this either through some 
sort of metadata that is submitted with the application to YARN, possibly 
through the service registry, or even perhaps just setting the progress to a 
special value like -1.  I think I would prefer the first one, because then YARN 
could use that metadata later on for other things.  After that the UI change 
should not be too difficult.  If you want to file a JIRA for it, either as a 
sub task or just link it in, that would be great.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-643) WHY appToken is removed both in BaseFinalTransition and AMUnregisteredTransition AND clientToken is removed in FinalTransition and not BaseFinalTransition


[ 
https://issues.apache.org/jira/browse/YARN-643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743868#comment-13743868
 ] 

Hudson commented on YARN-643:
-

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1523 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1523/])
YARN-643. Fixed ResourceManager to remove all tokens consistently on app 
finish. Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515256)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/TestRMAppAttemptTransitions.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestAMRMTokens.java


 WHY appToken is removed both in BaseFinalTransition and 
 AMUnregisteredTransition AND clientToken is removed in FinalTransition and 
 not BaseFinalTransition
 --

 Key: YARN-643
 URL: https://issues.apache.org/jira/browse/YARN-643
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-643.1.patch, YARN-643.2.patch, YARN-643.3.patch, 
 YARN-643.4.patch, YARN-643.5.patch


 The jira is tracking why appToken and clientToAMToken is removed separately, 
 and why they are distributed in different transitions, ideally there may be a 
 common place where these two tokens can be removed at the same time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-194) Log handling in case of NM restart.

2013-08-19 Thread Jason Lowe (JIRA)

[
https://issues.apache.org/jira/browse/YARN-194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743902#comment-13743902
]

Jason Lowe commented on YARN-194:
-

The NM waits not only for the container to complete but for the entire
application to complete -- see YARN-219. Holding long-lived leases on many
files in HDFS puts a lot of load on the namenode.

It also cannot append on the fly since all the logs for all containers for an
application on the node are in a single file in HDFS with the data for each log
being contiguous within that file. Adding the ability to append to multiple
log streams simultaneously is not possible in the current aggregated log format.

It would be nice to have some mechanism to get the NM to clean up logs, as
currently each time the NM restarts log files are being leaked. This has been
fixed for container local directories and the distributed cache via YARN-71,
but logs have been ignored. Seems like we should be consistent about these
two. If the application is still running, isn't YARN-71 already deleting the
app's current working directory and distcache files out from underneath it?

Log handling in case of NM restart.
---

Key: YARN-194
URL: https://issues.apache.org/jira/browse/YARN-194
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Affects Versions: 0.23.4
Reporter: Siddharth Seth
Assignee: Omkar Vinit Joshi

Currently, if an NM restarts - existing logs will be left around till they're
manually cleaned up. The NM could be improved to handle these files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1020) Resource Localization using Groups as a new Localization Type

2013-08-19 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743946#comment-13743946
]

Sangjin Lee commented on YARN-1020:
---

This is an interesting problem/challenge. I kind of like [~jlowe]'s idea to
make these files owned by the NM user. To me it seems consistent with the fact
that these files are really owned and manipulated by the NM user.

Resource Localization using Groups as a new Localization Type
-

Key: YARN-1020
URL: https://issues.apache.org/jira/browse/YARN-1020
Project: Hadoop YARN
Issue Type: Improvement
Reporter: Omkar Vinit Joshi

The scenario is as follows..
* We definitely will have multiple applications running on top of yarn. These
applications whenever run by users will need resources to be localized. Now
the options what application-users will have for localizing resources are:-
** APPLICATION ... these files will be available for only that instance of
the application and only for that single user. If we talk in terms of MR then
for single job.
** PRIVATE ... available only for that user only for multiple runs of that
application. Other users clearly will not be able to take advantage of that.
So ideally will be wasting space (local resource cache) by replicating the
same file again and again.
** PUBLIC... there will be only one copy of individual files of the
application say APP_1..GOOD ..in the sense it will be accessible to all the
users...But for secured clusters; users of different application (say APP_2)
containers can then gain easy access to this applications (APP_1) private
files and potentially may modify it.
So clearly we don't have any solution today to solve the above problem with
existing RESOURCE_LOCALIZATION_TYPES without effectively using space.
Therefore we need something like GROUP to address this scenario.
Thoughts??

[jira] [Created] (YARN-1079) Fix progress bar for long-lived services in YARN

2013-08-19 Thread Chris Riccomini (JIRA)

Chris Riccomini created YARN-1079:
-

 Summary: Fix progress bar for long-lived services in YARN
 Key: YARN-1079
 URL: https://issues.apache.org/jira/browse/YARN-1079
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Chris Riccomini


YARN currently shows a progress bar for jobs in its web UI. This is 
non-sensical for long-lived services, which have no concept of progress. For 
example, with Samza, we have stream processors which run for an indefinite 
amount of time (sometimes forever).

YARN should support jobs without a concept of progress. Some discussion about 
this is on YARN-896.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-466) Slave hostname mismatches in ResourceManager/Scheduler

2013-08-19 Thread Roger Hoover (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743986#comment-13743986
 ] 

Roger Hoover commented on YARN-466:
---

@[~zjshen], yes, I'm referring to the MapReduce Application Master.  Thanks for 
looking into this and for sharing what you've found so far.

 Slave hostname mismatches in ResourceManager/Scheduler
 --

 Key: YARN-466
 URL: https://issues.apache.org/jira/browse/YARN-466
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Reporter: Roger Hoover
Assignee: Zhijie Shen

 The problem is that the ResourceManager learns the hostname of a slave node 
 when the NodeManager registers itself and it seems the node manager is 
 getting the hostname by asking the OS.  When a job is submitted, I think the 
 ApplicationMaster learns the hostname by doing a reverse DNS lookup based on 
 the slaves file.
 Therefore, the ApplicationMaster submits requests for containers using the 
 fully qualified domain name (node1.foo.com) but the scheduler uses the OS 
 hostname (node1) when checking to see if any requests are node-local.  The 
 result is that node-local requests are never found using this method of 
 searching for node-local requests:
 ResourceRequest request = application.getResourceRequest(priority, 
 node.getHostName());
 I think it's unfriendly to ask users to make sure they configure hostnames to 
 match fully qualified domain names. There should be a way for the 
 ApplicationMaster and NodeManager to agree on the hostname.
 Steps to Reproduce:
 1) Configure the OS hostname on slaves to differ from the fully qualified 
 domain name.  For example, if the FQDN for the slave is node1.foo.com, set 
 the hostname on the node to be just node1.
 2) On submitting a job, observe that the AM submits resource requests using 
 the FQDN (e.g. node1.foo.com).  You can add logging to the allocate() 
 method of whatever scheduler you're using 
 for (ResourceRequest req: ask) {
   LOG.debug(String.format(Request %s for %d containers on %s, req, 
 req.getNumContainers(), req.getHostName()));
 }
 3) Observe that when the scheduler checks for node locality (in the handle() 
 method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is 
 the one set in the host OS (e.g. node1).  NOTE: if you're using 
 FifoScheduler, this bug needs to be fixed first 
 (https://issues.apache.org/jira/browse/YARN-412).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1078) TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater fail on Windows


[ 
https://issues.apache.org/jira/browse/YARN-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744003#comment-13744003
 ] 

Chuan Liu commented on YARN-1078:
-

bq. -1 core tests. The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

The patch seems regressing on Linux. I will investigate the failure.

 TestNodeManagerResync, TestNodeManagerShutdown, and TestNodeStatusUpdater 
 fail on Windows
 -

 Key: YARN-1078
 URL: https://issues.apache.org/jira/browse/YARN-1078
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1078.patch


 The three unit tests fail on Windows due to host name resolution differences 
 on Windows, i.e. 127.0.0.1 does not resolve to host name localhost.
 {noformat}
 org.apache.hadoop.security.token.SecretManager$InvalidToken: Given Container 
 container_0__01_00 identifier is not valid for current Node manager. 
 Expected : 127.0.0.1:12345 Found : localhost:12345
 {noformat}
 {noformat}
 testNMConnectionToRM(org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater)
   Time elapsed: 8343 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[localhost]:12345 but 
 was:[127.0.0.1]:12345
   at org.junit.Assert.assertEquals(Assert.java:125)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyResourceTracker6.registerNodeManager(TestNodeStatusUpdater.java:712)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
   at $Proxy26.registerNodeManager(Unknown Source)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:212)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:149)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater$MyNodeStatusUpdater4.serviceStart(TestNodeStatusUpdater.java:369)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:101)
   at 
 org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:213)
   at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNMConnectionToRM(TestNodeStatusUpdater.java:985)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1077) TestContainerLaunch fails on Windows

2013-08-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744024#comment-13744024
 ] 

Hadoop QA commented on YARN-1077:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598710/YARN-1077.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1739//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1739//console

This message is automatically generated.

 TestContainerLaunch fails on Windows
 

 Key: YARN-1077
 URL: https://issues.apache.org/jira/browse/YARN-1077
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1077.2.patch, YARN-1077.patch


 Several cases in this unit tests fail on Windows. (Append error log at the 
 end.)
 testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and 
 bash script error handling. If some command fails in the cmd script, cmd will 
 continue execute the the rest of the script command. Error handling needs to 
 be explicitly carried out in the script file. The error code of the last 
 command will be returned as the error code of the whole script. In this test, 
 some error happened in the middle of the cmd script, the test expect an 
 exception and non-zero error code. In the cmd script, the intermediate errors 
 are ignored. The last command call succeeded and there is no exception.
 testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands 
 used by the test.
 testContainerEnvVariables and testDelayedKill fail due to a regression from 
 YARN-906.
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec 
  FAILURE!
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 583 sec   FAILURE!
 junit.framework.AssertionFailedError: Should catch exception
   at junit.framework.Assert.fail(Assert.java:50)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269)
 ...
 testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 561 sec   FAILURE!
 junit.framework.AssertionFailedError: Should catch exception
   at junit.framework.Assert.fail(Assert.java:50)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314)
 ...
 testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 4136 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:137 but was:143
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at junit.framework.Assert.assertEquals(Assert.java:199)
   at junit.framework.Assert.assertEquals(Assert.java:205)
   at

[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.

2013-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744025#comment-13744025
 ] 

Jian He commented on YARN-881:
--

Hi [~sandyr], do you have more comments ?

 Priority#compareTo method seems to be wrong.
 

 Key: YARN-881
 URL: https://issues.apache.org/jira/browse/YARN-881
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-881.1.patch, YARN-881.patch


 if lower int value means higher priority, shouldn't we return 
 other.getPriority() - this.getPriority()  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-19 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744029#comment-13744029
 ] 

Steve Loughran commented on YARN-896:
-

Chris -I use the bar today as measure of expected nodes vs actual; i.e. what 
percentage of the goal of work has been met -which is free to vary up and down 
w/node failures -the percent bar is free to go in both directions

YARN-1039 already says add a flag to say long-lived, so that future versions 
of YARN can behave differently. This could do more than GUI -in particular 
YARN-3 cgroup limits would be something you may want to turn on for services, 
to exactly limit their RAM  CPU to what they asked for. If a long-lived 
service underestimates its requirements the impact on the node is worse than if 
a short-lived container does it -for that you may want to be more forgiving.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1080) Standardize help message for required parameter of $ yarn logs

2013-08-19 Thread Tassapol Athiapinya (JIRA)

Tassapol Athiapinya created YARN-1080:
-

 Summary: Standardize help message for required parameter of $ yarn 
logs
 Key: YARN-1080
 URL: https://issues.apache.org/jira/browse/YARN-1080
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
 Fix For: 2.1.0-beta


YARN CLI has a command logs ($ yarn logs). The command always requires a 
parameter of -applicationId arg. However, help message of the command does 
not make it clear. It lists -applicationId as optional parameter. If I don't 
set it, YARN CLI will complain this is missing. It is better to use standard 
required notation used in other Linux command for help message. Any user 
familiar to the command can understand that this parameter is needed more 
easily.

{code:title=current help message}
-bash-4.1$ yarn logs
usage: general options are:
 -applicationId arg   ApplicationId (required)
 -appOwner argAppOwner (assumed to be current user if not
specified)
 -containerId arg ContainerId (must be specified if node address is
specified)
 -nodeAddress arg NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}

{code:title=proposed help message}
-bash-4.1$ yarn logs
usage: yarn logs -applicationId application ID [OPTIONS]
general options are:
 -appOwner argAppOwner (assumed to be current user if not
specified)
 -containerId arg ContainerId (must be specified if node address is
specified)
 -nodeAddress arg NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1077) TestContainerLaunch fails on Windows


[ 
https://issues.apache.org/jira/browse/YARN-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744049#comment-13744049
 ] 

Chuan Liu commented on YARN-1077:
-

bq. -1 core tests. The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

I will take a look at the failure.

 TestContainerLaunch fails on Windows
 

 Key: YARN-1077
 URL: https://issues.apache.org/jira/browse/YARN-1077
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: YARN-1077.2.patch, YARN-1077.patch


 Several cases in this unit tests fail on Windows. (Append error log at the 
 end.)
 testInvalidEnvSyntaxDiagnostics fails because the difference between cmd and 
 bash script error handling. If some command fails in the cmd script, cmd will 
 continue execute the the rest of the script command. Error handling needs to 
 be explicitly carried out in the script file. The error code of the last 
 command will be returned as the error code of the whole script. In this test, 
 some error happened in the middle of the cmd script, the test expect an 
 exception and non-zero error code. In the cmd script, the intermediate errors 
 are ignored. The last command call succeeded and there is no exception.
 testContainerLaunchStdoutAndStderrDiagnostics fails due to wrong cmd commands 
 used by the test.
 testContainerEnvVariables and testDelayedKill fail due to a regression from 
 YARN-906.
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch
 ---
 Tests run: 7, Failures: 4, Errors: 0, Skipped: 0, Time elapsed: 11.526 sec 
  FAILURE!
 testInvalidEnvSyntaxDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 583 sec   FAILURE!
 junit.framework.AssertionFailedError: Should catch exception
   at junit.framework.Assert.fail(Assert.java:50)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testInvalidEnvSyntaxDiagnostics(TestContainerLaunch.java:269)
 ...
 testContainerLaunchStdoutAndStderrDiagnostics(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 561 sec   FAILURE!
 junit.framework.AssertionFailedError: Should catch exception
   at junit.framework.Assert.fail(Assert.java:50)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerLaunchStdoutAndStderrDiagnostics(TestContainerLaunch.java:314)
 ...
 testContainerEnvVariables(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 4136 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:137 but was:143
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at junit.framework.Assert.assertEquals(Assert.java:199)
   at junit.framework.Assert.assertEquals(Assert.java:205)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testContainerEnvVariables(TestContainerLaunch.java:500)
 ...
 testDelayedKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch)
   Time elapsed: 2744 sec   FAILURE!
 junit.framework.AssertionFailedError: expected:137 but was:143
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at junit.framework.Assert.assertEquals(Assert.java:199)
   at junit.framework.Assert.assertEquals(Assert.java:205)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch.testDelayedKill(TestContainerLaunch.java:601)
 ...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1080) Improve help message for $ yarn logs

2013-08-19 Thread Tassapol Athiapinya (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tassapol Athiapinya updated YARN-1080:
--

Description: 
There are 2 parts I am proposing in this jira. They can be fixed together in 
one patch.

1. Standardize help message for required parameter of $ yarn logs
YARN CLI has a command logs ($ yarn logs). The command always requires a 
parameter of -applicationId arg. However, help message of the command does 
not make it clear. It lists -applicationId as optional parameter. If I don't 
set it, YARN CLI will complain this is missing. It is better to use standard 
required notation used in other Linux command for help message. Any user 
familiar to the command can understand that this parameter is needed more 
easily.

{code:title=current help message}
-bash-4.1$ yarn logs
usage: general options are:
 -applicationId arg   ApplicationId (required)
 -appOwner argAppOwner (assumed to be current user if not
specified)
 -containerId arg ContainerId (must be specified if node address is
specified)
 -nodeAddress arg NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}

{code:title=proposed help message}
-bash-4.1$ yarn logs
usage: yarn logs -applicationId application ID [OPTIONS]
general options are:
 -appOwner argAppOwner (assumed to be current user if not
specified)
 -containerId arg ContainerId (must be specified if node address is
specified)
 -nodeAddress arg NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}

2. Add description for help command. As far as I know, a user cannot get logs 
for running job. Since I spent some time trying to get logs of running 
applications, it should be nice to say this in command description.
{code:title=proposed help}
Retrieve logs for completed/killed YARN application
usage: general options are...
{code}


  was:
YARN CLI has a command logs ($ yarn logs). The command always requires a 
parameter of -applicationId arg. However, help message of the command does 
not make it clear. It lists -applicationId as optional parameter. If I don't 
set it, YARN CLI will complain this is missing. It is better to use standard 
required notation used in other Linux command for help message. Any user 
familiar to the command can understand that this parameter is needed more 
easily.

{code:title=current help message}
-bash-4.1$ yarn logs
usage: general options are:
 -applicationId arg   ApplicationId (required)
 -appOwner argAppOwner (assumed to be current user if not
specified)
 -containerId arg ContainerId (must be specified if node address is
specified)
 -nodeAddress arg NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}

{code:title=proposed help message}
-bash-4.1$ yarn logs
usage: yarn logs -applicationId application ID [OPTIONS]
general options are:
 -appOwner argAppOwner (assumed to be current user if not
specified)
 -containerId arg ContainerId (must be specified if node address is
specified)
 -nodeAddress arg NodeAddress in the format nodename:port (must be
specified if container id is specified)
{code}

Summary: Improve help message for $ yarn logs  (was: Standardize help 
message for required parameter of $ yarn logs)

 Improve help message for $ yarn logs
 

 Key: YARN-1080
 URL: https://issues.apache.org/jira/browse/YARN-1080
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
 Fix For: 2.1.0-beta


 There are 2 parts I am proposing in this jira. They can be fixed together in 
 one patch.
 1. Standardize help message for required parameter of $ yarn logs
 YARN CLI has a command logs ($ yarn logs). The command always requires a 
 parameter of -applicationId arg. However, help message of the command 
 does not make it clear. It lists -applicationId as optional parameter. If I 
 don't set it, YARN CLI will complain this is missing. It is better to use 
 standard required notation used in other Linux command for help message. Any 
 user familiar to the command can understand that this parameter is needed 
 more easily.
 {code:title=current help message}
 -bash-4.1$ yarn logs
 usage: general options are:
  -applicationId arg   ApplicationId (required)
  -appOwner argAppOwner (assumed to be current user if not
 specified)
  -containerId arg ContainerId (must be specified if node

[jira] [Commented] (YARN-49) Improve distributed shell application to work on a secure cluster


[ 
https://issues.apache.org/jira/browse/YARN-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744054#comment-13744054
 ] 

Omkar Vinit Joshi commented on YARN-49:
---

yes it is not working because of missing token propagation... I thought it is 
fixed but it is not..

 Improve distributed shell application to work on a secure cluster
 -

 Key: YARN-49
 URL: https://issues.apache.org/jira/browse/YARN-49
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications/distributed-shell
Reporter: Hitesh Shah
Assignee: Omkar Vinit Joshi



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-896) Roll up for long lived YARN

2013-08-19 Thread Chris Riccomini (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744061#comment-13744061
 ] 

Chris Riccomini commented on YARN-896:
--

[~stev...@iseran.com] I've linked the JIRAs as relates to. The progress 
behavior you're describing is somewhat reasonable, but a bit unintuitive. Still 
feels like a hack. If that's the route we want to go, we should change the UI 
accordingly. If you think YARN-1079 is a dupe, feel free to close and update 
YARN-1039 with UI notes.

Regarding CGroup limits, have a look at YARN-810. Might be related to what 
you're saying.

 Roll up for long lived YARN
 ---

 Key: YARN-896
 URL: https://issues.apache.org/jira/browse/YARN-896
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Robert Joseph Evans

 YARN is intended to be general purpose, but it is missing some features to be 
 able to truly support long lived applications and long lived containers.
 This ticket is intended to
  # discuss what is needed to support long lived processes
  # track the resulting JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1081) Minor improvement to output header for $ yarn node -list

2013-08-19 Thread Tassapol Athiapinya (JIRA)

Tassapol Athiapinya created YARN-1081:
-

 Summary: Minor improvement to output header for $ yarn node -list
 Key: YARN-1081
 URL: https://issues.apache.org/jira/browse/YARN-1081
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
 Fix For: 2.1.0-beta


Output of $ yarn node -list shows number of running containers at each node. I 
found a case when new user of YARN thinks that this is container ID, use it 
later in other YARN commands and find an error due to misunderstanding.

{code:title=current output}
2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State  
Node-Http-Address   Running-Containers
2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454   RUNNING  
myhost:50060   2
{code}

{code:title=proposed output}
2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id Node-State  
Node-Http-Address   Number-of-Running-Containers
2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454   RUNNING  
myhost:50060   2
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.

2013-08-19 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744084#comment-13744084
 ] 

Sandy Ryza commented on YARN-881:
-

Lgtm, +1

 Priority#compareTo method seems to be wrong.
 

 Key: YARN-881
 URL: https://issues.apache.org/jira/browse/YARN-881
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-881.1.patch, YARN-881.patch


 if lower int value means higher priority, shouldn't we return 
 other.getPriority() - this.getPriority()  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-881) Priority#compareTo method seems to be wrong.

2013-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744114#comment-13744114
 ] 

Jian He commented on YARN-881:
--

[~sandyr], can you commit this also ? thanks!

 Priority#compareTo method seems to be wrong.
 

 Key: YARN-881
 URL: https://issues.apache.org/jira/browse/YARN-881
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-881.1.patch, YARN-881.patch


 if lower int value means higher priority, shouldn't we return 
 other.getPriority() - this.getPriority()  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Arpit Gupta (JIRA)

Arpit Gupta created YARN-1082:
-

 Summary: Secure RM with recovery enabled and rm state store on 
hdfs fails with gss exception
 Key: YARN-1082
 URL: https://issues.apache.org/jira/browse/YARN-1082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Arpit Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744121#comment-13744121
 ] 

Arpit Gupta commented on YARN-1082:
---

Here are the logs

{code}
2013-08-17 17:32:08,272 INFO  resourcemanager.ResourceManager 
(SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, 
HUP, INT]
2013-08-17 17:32:08,544 DEBUG service.AbstractService 
(AbstractService.java:enterState(452)) - Service: ResourceManager entered state 
INITED
2013-08-17 17:32:08,683 DEBUG service.CompositeService 
(CompositeService.java:addService(69)) - Adding service Dispatcher
2013-08-17 17:32:08,685 INFO  security.AMRMTokenSecretManager 
(AMRMTokenSecretManager.java:rollMasterKey(105)) - Rolling master-key for 
amrm-tokens
2013-08-17 17:32:08,690 DEBUG service.CompositeService 
(CompositeService.java:addService(69)) - Adding service 
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer
2013-08-17 17:32:08,691 DEBUG service.CompositeService 
(CompositeService.java:addService(69)) - Adding service AMLivelinessMonitor
2013-08-17 17:32:08,691 DEBUG service.CompositeService 
(CompositeService.java:addService(69)) - Adding service AMLivelinessMonitor
2013-08-17 17:32:08,694 DEBUG service.CompositeService 
(CompositeService.java:addService(69)) - Adding service 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer
2013-08-17 17:32:08,699 INFO  security.RMContainerTokenSecretManager 
(RMContainerTokenSecretManager.java:init(75)) - 
ContainerTokenKeyRollingInterval: 8640ms and 
ContainerTokenKeyActivationDelay: 90ms
2013-08-17 17:32:08,704 INFO  security.NMTokenSecretManagerInRM 
(NMTokenSecretManagerInRM.java:init(77)) - NMTokenKeyRollingInterval: 
8640ms and NMTokenKeyActivationDelay: 90ms
2013-08-17 17:32:08,738 DEBUG service.AbstractService 
(AbstractService.java:enterState(452)) - Service: Dispatcher entered state 
INITED
2013-08-17 17:32:08,847 INFO  event.AsyncDispatcher 
(AsyncDispatcher.java:register(157)) - Registering class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType 
for class 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
2013-08-17 17:32:08,848 DEBUG service.AbstractService 
(AbstractService.java:start(197)) - Service Dispatcher is started
2013-08-17 17:32:09,084 DEBUG security.Groups 
(Groups.java:getUserToGroupsMappingService(180)) -  Creating new Groups object
2013-08-17 17:32:09,088 DEBUG util.NativeCodeLoader 
(NativeCodeLoader.java:clinit(46)) - Trying to load the custom-built 
native-hadoop library...
2013-08-17 17:32:09,089 DEBUG util.NativeCodeLoader 
(NativeCodeLoader.java:clinit(50)) - Loaded the native-hadoop library
2013-08-17 17:32:09,089 DEBUG security.JniBasedUnixGroupsMapping 
(JniBasedUnixGroupsMapping.java:clinit(50)) - Using JniBasedUnixGroupsMapping 
for Group resolution
2013-08-17 17:32:09,090 DEBUG security.JniBasedUnixGroupsMappingWithFallback 
(JniBasedUnixGroupsMappingWithFallback.java:init(44)) - Group mapping 
impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
2013-08-17 17:32:09,090 DEBUG security.Groups (Groups.java:init(66)) - Group 
mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; 
cacheTimeout=30
2013-08-17 17:32:09,097 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:login(176)) - hadoop login
2013-08-17 17:32:09,097 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:commit(125)) - hadoop login commit
2013-08-17 17:32:09,098 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:commit(139)) - using kerberos user:null
2013-08-17 17:32:09,099 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:commit(155)) - using local user:UnixPrincipal: yarn
2013-08-17 17:32:09,101 DEBUG security.UserGroupInformation 
(UserGroupInformation.java:getLoginUser(696)) - UGI loginUser:yarn 
(auth:KERBEROS)
2013-08-17 17:32:09,216 DEBUG hdfs.BlockReaderLocal 
(DFSClient.java:init(326)) - dfs.client.use.legacy.blockreader.local = false
2013-08-17 17:32:09,217 DEBUG hdfs.BlockReaderLocal 
(DFSClient.java:init(329)) - dfs.client.read.shortcircuit = true
2013-08-17 17:32:09,217 DEBUG hdfs.BlockReaderLocal 
(DFSClient.java:init(332)) - dfs.client.domain.socket.data.traffic = false
2013-08-17 17:32:09,217 DEBUG hdfs.BlockReaderLocal 
(DFSClient.java:init(335)) - dfs.domain.socket.path = 
/var/lib/hadoop-hdfs/dn_socket
2013-08-17 17:32:09,234 DEBUG hdfs.HAUtil 
(HAUtil.java:cloneDelegationTokenForLogicalUri(276)) - No HA service delegation 
token found for logical URI hdfs://host/apps/yarn/recovery
2013-08-17 17:32:09,235 DEBUG hdfs.BlockReaderLocal 
(DFSClient.java:init(326)) - dfs.client.use.legacy.blockreader.local = false
2013-08-17 17:32:09,235 DEBUG hdfs.BlockReaderLocal 
(DFSClient.java:init(329)) - dfs.client.read.shortcircuit = true
2013-08-17

[jira] [Created] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval

2013-08-19 Thread yeshavora (JIRA)

yeshavora created YARN-1083:
---

 Summary: ResourceManager should fail when 
yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval
 Key: YARN-1083
 URL: https://issues.apache.org/jira/browse/YARN-1083
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora


if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than heartbeat 
iterval, all the node managers will be added in 'Lost Nodes'

Instead, Resource Manager should validate these property and It should fail to 
start if combination of such property is invalid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval

2013-08-19 Thread yeshavora (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yeshavora updated YARN-1083:


Affects Version/s: 2.1.0-beta

 ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms 
 is set less than heartbeat interval
 

 Key: YARN-1083
 URL: https://issues.apache.org/jira/browse/YARN-1083
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora

 if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than 
 heartbeat iterval, all the node managers will be added in 'Lost Nodes'
 Instead, Resource Manager should validate these property and It should fail 
 to start if combination of such property is invalid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Arpit Gupta (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744154#comment-13744154
 ] 

Arpit Gupta commented on YARN-1082:
---

It looks like we try to interact with hdfs before the rm has logged in using 
the keytab.

 Secure RM with recovery enabled and rm state store on hdfs fails with gss 
 exception
 ---

 Key: YARN-1082
 URL: https://issues.apache.org/jira/browse/YARN-1082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval


 [ 
https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1083:


Labels: newbie  (was: )

 ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms 
 is set less than heartbeat interval
 

 Key: YARN-1083
 URL: https://issues.apache.org/jira/browse/YARN-1083
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: yeshavora
  Labels: newbie

 if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than 
 heartbeat iterval, all the node managers will be added in 'Lost Nodes'
 Instead, Resource Manager should validate these property and It should fail 
 to start if combination of such property is invalid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1083) ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms is set less than heartbeat interval


 [ 
https://issues.apache.org/jira/browse/YARN-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1083:


Component/s: resourcemanager

 ResourceManager should fail when yarn.nm.liveness-monitor.expiry-interval-ms 
 is set less than heartbeat interval
 

 Key: YARN-1083
 URL: https://issues.apache.org/jira/browse/YARN-1083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: yeshavora
  Labels: newbie

 if 'yarn.nm.liveness-monitor.expiry-interval-ms' is set to less than 
 heartbeat iterval, all the node managers will be added in 'Lost Nodes'
 Instead, Resource Manager should validate these property and It should fail 
 to start if combination of such property is invalid.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744252#comment-13744252
 ] 

Jian He commented on YARN-1082:
---

Yes, specifically, we should create the state store base directories after 
doSecureLogin() inside ResourceManaager.serviceStart() has been called.
So propose to augment RMStateStore to extend service model, where creating base 
dirs can be performed inside serviceStart().

 Secure RM with recovery enabled and rm state store on hdfs fails with gss 
 exception
 ---

 Key: YARN-1082
 URL: https://issues.apache.org/jira/browse/YARN-1082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Jian He



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken

2013-08-19 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744306#comment-13744306
 ] 

Vinod Kumar Vavilapalli commented on YARN-1006:
---

+1, the patch looks good to me. Tested this on a single node and the bug's gone.

Checking this in.

 Nodes list web page on the RM web UI is broken
 --

 Key: YARN-1006
 URL: https://issues.apache.org/jira/browse/YARN-1006
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Xuan Gong
 Attachments: YARN-1006.1.patch


 The nodes web page which list all the connected nodes of the cluster is 
 broken.
 1. The page is not showing in correct format/style.
 2. If we restart the NM, the node list is not refreshed, but just add the new 
 started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-905) Add state filters to nodes CLI

2013-08-19 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-905:
-

Attachment: YARN-905.patch

retrigger the QA server

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: Yarn-905.patch, YARN-905.patch, YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1006) Nodes list web page on the RM web UI is broken


[ 
https://issues.apache.org/jira/browse/YARN-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744346#comment-13744346
 ] 

Hudson commented on YARN-1006:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4292 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4292/])
YARN-1006. Fixed broken rendering in the Nodes list web page on the RM web UI. 
Contributed by Xuan Gong. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1515629)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java


 Nodes list web page on the RM web UI is broken
 --

 Key: YARN-1006
 URL: https://issues.apache.org/jira/browse/YARN-1006
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He
Assignee: Xuan Gong
 Fix For: 2.1.1-beta

 Attachments: YARN-1006.1.patch


 The nodes web page which list all the connected nodes of the cluster is 
 broken.
 1. The page is not showing in correct format/style.
 2. If we restart the NM, the node list is not refreshed, but just add the new 
 started NM to the list. The old NMs information still remain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-905) Add state filters to nodes CLI

2013-08-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744391#comment-13744391
 ] 

Hadoop QA commented on YARN-905:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598850/YARN-905.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1740//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1740//console

This message is automatically generated.

 Add state filters to nodes CLI
 --

 Key: YARN-905
 URL: https://issues.apache.org/jira/browse/YARN-905
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Wei Yan
 Attachments: Yarn-905.patch, YARN-905.patch, YARN-905.patch


 It would be helpful for the nodes CLI to have a node-states option that 
 allows it to return nodes that are not just in the RUNNING state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator

2013-08-19 Thread Carlo Curino (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744493#comment-13744493
]

Carlo Curino commented on YARN-1021:

Sorry for the delay. I went over the patch today together with Chris Douglas
and here some input from the both of us.

I generally like the effort, and the live visualization is really neat. Also
making it into a completely separate tool is convenient/safe.

The main limitations I see in this simulator are:
* it only simulates the Scheduler code, mocking out most of the RM, and all AM
and NM, communication, submissions...
* If I am not mistaken runs at wall-clock time (not faster)
* does not run the monitors which are needed for simulating preemption in the
CapacityScheduler

An alternative approach that we explored was to hijack the Clocks around the
RM and drive them using a discrete event simulation, thus exercising more of
the RM code, protocols etc... and enabling faster than wall-clock speeds
(though not trivial to achieve). We have some working but not polished code in
this space, which we could probably provide if you think might be
integrated/leveraged.

Ignoring alternative approaches, and broader spectrum we mentioned above, there
are few issues with the current patch:
* It should be possible to consistently replay (seed RANDOM)
* Using Rumen reader (JobProducer, etc.) instead of parsing json directly seems
cleaner. Also we have a synth load generator which we will release soon that
implements the JobProducer/JobStory interface (might be nice to use that to
drive your simulations)
* LICENSE/NOTICE should be updated to include the BSD-like licenses you bring
in with the new libraries
* It seems somewhat hard to detect regressions w/ trunk since:
** mocks away much of the AM/NM/RM
** few unit tests
** does not simulate important behaviors in the AM (no slow start, headroom,
etc.)
** does not exercise failures, timeouts, etc.

Smaller issues:
* some javadoc @param unpopulated
* why a dependency on another metrics package, instead of Hadoop's?
* why NodeUpdateSchedulerEventWrapper? Doesn't seem to add anything...
* use ResourceCalculator instead of manually adjusting Resources from RR
* initMetrics is a very large method...
* SLSWebApp: is a wall of string appends. I am not very web savvy but I believe
there should be cleaner ways to generate this. This seems hard to
maintain/evolve.

I hope this helps. I will be traveling abroad for a couple of weeks so I might
be slow/unresponsive. Altogether since it is rather on a side I am not too
concern about it, the suggestions are mostly to make sure it is really useful
and that people can use it / maintain it overtime. If committed as is will do
no harm, but I think it risk to be dropped in, used twice for FairScheduler
work, and than loose relevance and get out of sync from trunk.

Yarn Scheduler Load Simulator
-

Key: YARN-1021
URL: https://issues.apache.org/jira/browse/YARN-1021
Project: Hadoop YARN
Issue Type: New Feature
Components: scheduler
Reporter: Wei Yan
Assignee: Wei Yan
Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz,
YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch,
YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf

The Yarn Scheduler is a fertile area of interest with different
implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile,
several optimizations are also made to improve scheduler performance for
different scenarios and workload. Each scheduler algorithm has its own set of
features, and drives scheduling decisions by many factors, such as fairness,
capacity guarantee, resource availability, etc. It is very important to
evaluate a scheduler algorithm very well before we deploy it in a production
cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling
algorithm. Evaluating in a real cluster is always time and cost consuming,
and it is also very hard to find a large-enough cluster. Hence, a simulator
which can predict how well a scheduler algorithm for some specific workload
would be quite useful.
We want to build a Scheduler Load Simulator to simulate large-scale Yarn
clusters and application loads in a single machine. This would be invaluable
in furthering Yarn by providing a tool for researchers and developers to
prototype new scheduler features and predict their behavior and performance
with reasonable amount of confidence, there-by aiding rapid innovation.
The simulator will exercise the real Yarn ResourceManager removing the
network factor by simulating NodeManagers and ApplicationMasters via handling
and dispatching NM/AMs heartbeat events from within the same JVM.
To keep tracking of scheduler

[jira] [Updated] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1082:


Priority: Blocker  (was: Major)

 Secure RM with recovery enabled and rm state store on hdfs fails with gss 
 exception
 ---

 Key: YARN-1082
 URL: https://issues.apache.org/jira/browse/YARN-1082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Blocker



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Arun C Murthy (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated YARN-1082:


Target Version/s: 2.1.1-beta

 Secure RM with recovery enabled and rm state store on hdfs fails with gss 
 exception
 ---

 Key: YARN-1082
 URL: https://issues.apache.org/jira/browse/YARN-1082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Jian He
Priority: Blocker



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1081) Minor improvement to output header for $ yarn node -list


 [ 
https://issues.apache.org/jira/browse/YARN-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1081:


Labels: newbie  (was: )

 Minor improvement to output header for $ yarn node -list
 

 Key: YARN-1081
 URL: https://issues.apache.org/jira/browse/YARN-1081
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Priority: Minor
  Labels: newbie
 Fix For: 2.1.0-beta


 Output of $ yarn node -list shows number of running containers at each node. 
 I found a case when new user of YARN thinks that this is container ID, use it 
 later in other YARN commands and find an error due to misunderstanding.
 {code:title=current output}
 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id   Node-State  
 Node-Http-Address   Running-Containers
 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING  
 myhost:50060   2
 {code}
 {code:title=proposed output}
 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id   Node-State  
 Node-Http-Address   Number-of-Running-Containers
 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING  
 myhost:50060   2
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-1081) Minor improvement to output header for $ yarn node -list


 [ 
https://issues.apache.org/jira/browse/YARN-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1081:


Priority: Minor  (was: Major)

 Minor improvement to output header for $ yarn node -list
 

 Key: YARN-1081
 URL: https://issues.apache.org/jira/browse/YARN-1081
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Priority: Minor
 Fix For: 2.1.0-beta


 Output of $ yarn node -list shows number of running containers at each node. 
 I found a case when new user of YARN thinks that this is container ID, use it 
 later in other YARN commands and find an error due to misunderstanding.
 {code:title=current output}
 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id   Node-State  
 Node-Http-Address   Running-Containers
 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING  
 myhost:50060   2
 {code}
 {code:title=proposed output}
 2013-07-31 04:00:37,814|beaver.machine|INFO|RUNNING: /usr/bin/yarn node -list
 2013-07-31 04:00:38,746|beaver.machine|INFO|Total Nodes:1
 2013-07-31 04:00:38,747|beaver.machine|INFO|Node-Id   Node-State  
 Node-Http-Address   Number-of-Running-Containers
 2013-07-31 04:00:38,747|beaver.machine|INFO|myhost:45454 RUNNING  
 myhost:50060   2
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1082) Secure RM with recovery enabled and rm state store on hdfs fails with gss exception

2013-08-19 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reassigned YARN-1082:
-

Assignee: Vinod Kumar Vavilapalli  (was: Jian He)

Taking this over for a quick fix.

 Secure RM with recovery enabled and rm state store on hdfs fails with gss 
 exception
 ---

 Key: YARN-1082
 URL: https://issues.apache.org/jira/browse/YARN-1082
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers


[ 
https://issues.apache.org/jira/browse/YARN-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744583#comment-13744583
 ] 

Omkar Vinit Joshi commented on YARN-1076:
-

Hi [~maysamyabandeh] did you see this issue by code walk through? or you faced 
this in your cluster? related YARN-957 ?

 RM gets stuck with a reservation, ignoring new containers
 -

 Key: YARN-1076
 URL: https://issues.apache.org/jira/browse/YARN-1076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Maysam Yabandeh
Priority: Minor

 LeafQueue#assignContainers rejects newly available containers if 
 #needContainers returns false:
 {code:java}
   if (!needContainers(application, priority, required)) {
 continue;
   }
 {code}
 When the application has already reserved all the required containers, 
 #needContainers returns false as long as no starvation is reported:
 {code:java}
 return (((starvation + requiredContainers) - reservedContainers)  0);
 {code}
 where starvation is computed based on the attempts on re-reserving a 
 resource. On the other hand, a resource is re-reserved via 
 #assignContainersOnNode only if it passed the #needContainers precondition:
 {code:java}
   // Do we need containers at this 'priority'?
   if (!needContainers(application, priority, required)) {
 continue;
   }
   //.
   //.
   //.
   
   // Try to schedule
   CSAssignment assignment =  
 assignContainersOnNode(clusterResource, node, application, 
 priority, 
 null);
 {code}
 In other words, once needContainers returns false due to a reservation, it 
 keeps rejecting newly available resources, since no reservation is ever 
 attempted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1084) RM restart does not work for map only job

2013-08-19 Thread yeshavora (JIRA)

yeshavora created YARN-1084:
---

 Summary: RM restart does not work for map only job
 Key: YARN-1084
 URL: https://issues.apache.org/jira/browse/YARN-1084
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: yeshavora


Map only job (randomwriter, randomtextwriter) restarts from scratch [0% map 0% 
reduce] after RM restart.
It should resume from the last state when RM restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (YARN-1085) Yarn and MRv2 should do HTTP client authentication in kerberos setup.

2013-08-19 Thread Jaimin D Jetly (JIRA)

Jaimin D Jetly created YARN-1085:


 Summary: Yarn and MRv2 should do HTTP client authentication in 
kerberos setup.
 Key: YARN-1085
 URL: https://issues.apache.org/jira/browse/YARN-1085
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager, resourcemanager
Reporter: Jaimin D Jetly


In kerberos setup it's expected for a http client to authenticate to kerberos 
before allowing user to browse any information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-1085) Yarn and MRv2 should do HTTP client authentication in kerberos setup.


 [ 
https://issues.apache.org/jira/browse/YARN-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned YARN-1085:
---

Assignee: Omkar Vinit Joshi

 Yarn and MRv2 should do HTTP client authentication in kerberos setup.
 -

 Key: YARN-1085
 URL: https://issues.apache.org/jira/browse/YARN-1085
 Project: Hadoop YARN
  Issue Type: Task
  Components: nodemanager, resourcemanager
Reporter: Jaimin D Jetly
Assignee: Omkar Vinit Joshi
  Labels: security

 In kerberos setup it's expected for a http client to authenticate to kerberos 
 before allowing user to browse any information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers

2013-08-19 Thread Maysam Yabandeh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13744683#comment-13744683
 ] 

Maysam Yabandeh commented on YARN-1076:
---

Hi [~ojoshi]. I am observing the problem with a unit test using 
MiniYarnCluster. The explanation however is based solely on code walk through. 
I did not submit the test case since the problem did not always show up--due to 
the non-determinism in MiniYarnCluster.

Anyway, I see that you have already covered that in the objectives of YARN-957:

| Say 2048MB is reserved on nm1 but nm2 comes back with 2048MB available 
memory. In this case if the original request was made without any locality then 
scheduler should unreserve memory on nm1 and allocate requested 2048MB 
container on nm2.


 RM gets stuck with a reservation, ignoring new containers
 -

 Key: YARN-1076
 URL: https://issues.apache.org/jira/browse/YARN-1076
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Maysam Yabandeh
Priority: Minor

 LeafQueue#assignContainers rejects newly available containers if 
 #needContainers returns false:
 {code:java}
   if (!needContainers(application, priority, required)) {
 continue;
   }
 {code}
 When the application has already reserved all the required containers, 
 #needContainers returns false as long as no starvation is reported:
 {code:java}
 return (((starvation + requiredContainers) - reservedContainers)  0);
 {code}
 where starvation is computed based on the attempts on re-reserving a 
 resource. On the other hand, a resource is re-reserved via 
 #assignContainersOnNode only if it passed the #needContainers precondition:
 {code:java}
   // Do we need containers at this 'priority'?
   if (!needContainers(application, priority, required)) {
 continue;
   }
   //.
   //.
   //.
   
   // Try to schedule
   CSAssignment assignment =  
 assignContainersOnNode(clusterResource, node, application, 
 priority, 
 null);
 {code}
 In other words, once needContainers returns false due to a reservation, it 
 keeps rejecting newly available resources, since no reservation is ever 
 attempted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-1076) RM gets stuck with a reservation, ignoring new containers