date:20131002


 [ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1260:
--

Priority: Major  (was: Blocker)

I agree it breaks setup, but it is no blocker as there is a clear work around.

 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1260.20131030.1.patch


 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified


[ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783689#comment-13783689
 ] 

Vinod Kumar Vavilapalli commented on YARN-1260:
---

Patch is trivial and looks good anyways. Checking this in.

 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1260.20131030.1.patch


 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783702#comment-13783702
 ] 

Bikas Saha commented on YARN-445:
-

How does the Windows JVM handle ctrl-break? How would be emulate a ctrl-c 
signal that would trigger the JVM shutdown hook?

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
 Attachments: YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783708#comment-13783708
 ] 

Bikas Saha commented on YARN-1197:
--

Still thinking through the RM-NM interactions. The request for change should 
probably be a new object that is basically a map of (containerId, Resource) 
where Resource is new value for the existing containerId. Not quite sure how we 
would use the new container token for a running container since its only used 
in start container.
If we wait for RM to sync with NM about the increased resources then it might 
be too slow since this happens on a heartbeat and the heartbeat interval can be 
in the order of seconds. An alternative would be a new NM API to allow AM's to 
increase resources and this would be signed with new container token. But this 
would burden the AMs by requiring them to make that additional call.
There could be a race between a new container token coming in with increased 
resources for an acquired container and the old container token being used by 
the NMClient to launch the container (in case the AM decides to launch the 
smaller container while it was waiting for an increase).

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1232) Configuration to support multiple RMs


 [ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1232:
---

Attachment: yarn-1232-6.patch

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions


 [ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-658:
-

Target Version/s: 2.1.2-beta

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783714#comment-13783714
 ] 

Karthik Kambatla commented on YARN-1232:


[~bikassaha], thanks. Updated the patch to address all your comments except one.

Changed the two configs to {{yarn.resourcemanager.ha.rm-ids}} and 
{{yarn.resourcemanager.ha.id}} - the reason for including ha in the latter is 
because the RM's id is relevant only when HA is enabled. For the tokens 
themselves, a logical-name is more apt and is best added by the JIRA that 
handles the token-related logic (may be, YARN-986).

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783715#comment-13783715
 ] 

Karthik Kambatla commented on YARN-1232:


bq. In HA enabled scenarios we need the explicit rm ids. How are we handling 
rm-id in non-HA setups or in existing clusters where no rm-id is being set 
currently?

This RM id is used only when the HA is enabled. 

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783724#comment-13783724
 ] 

Hadoop QA commented on YARN-1232:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606267/yarn-1232-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2060//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2060//console

This message is automatically generated.

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1010) FairScheduler: decouple container scheduling from nodemanager heartbeats


[ 
https://issues.apache.org/jira/browse/YARN-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783813#comment-13783813
 ] 

Hudson commented on YARN-1010:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/350/])
YARN-1010. FairScheduler: decouple container scheduling from nodemanager 
heartbeats. (Wei Yan via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528192)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AppSchedulable.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 FairScheduler: decouple container scheduling from nodemanager heartbeats
 

 Key: YARN-1010
 URL: https://issues.apache.org/jira/browse/YARN-1010
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Wei Yan
Priority: Critical
 Fix For: 2.3.0

 Attachments: YARN-1010.patch, YARN-1010.patch


 Currently scheduling for a node is done when a node heartbeats.
 For large cluster where the heartbeat interval is set to several seconds this 
 delays scheduling of incoming allocations significantly.
 We could have a continuous loop scanning all nodes and doing scheduling. If 
 there is availability AMs will get the allocation in the next heartbeat after 
 the one that placed the request.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1215) Yarn URL should include userinfo


[ 
https://issues.apache.org/jira/browse/YARN-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783815#comment-13783815
 ] 

Hudson commented on YARN-1215:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/350/])
YARN-1215. Correct CHANGES.txt. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528239)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1215. Yarn URL should include userinfo. Contributed by Chuan Liu. 
(cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528233)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/URL.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/URLPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestConverterUtils.java


 Yarn URL should include userinfo
 

 Key: YARN-1215
 URL: https://issues.apache.org/jira/browse/YARN-1215
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
 Fix For: 3.0.0, 2.1.2-beta

 Attachments: YARN-1215-trunk.2.patch, YARN-1215-trunk.patch


 In the {{org.apache.hadoop.yarn.api.records.URL}} class, we don't have an 
 userinfo as part of the URL. When converting a {{java.net.URI}} object into 
 the YARN URL object in {{ConverterUtils.getYarnUrlFromURI()}} method, we will 
 set uri host as the url host. If the uri has a userinfo part, the userinfo is 
 discarded. This will lead to information loss if the original uri has the 
 userinfo, e.g. foo://username:passw...@example.com will be converted to 
 foo://example.com and username/password information is lost during the 
 conversion.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1228) Clean up Fair Scheduler configuration loading


[ 
https://issues.apache.org/jira/browse/YARN-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783816#comment-13783816
 ] 

Hudson commented on YARN-1228:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/350/])
YARN-1228. Clean up Fair Scheduler configuration loading. (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528201)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/test-fair-scheduler.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm


 Clean up Fair Scheduler configuration loading
 -

 Key: YARN-1228
 URL: https://issues.apache.org/jira/browse/YARN-1228
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1228-1.patch, YARN-1228-2.patch, YARN-1228.patch


 Currently the Fair Scheduler is configured in two ways
 * An allocations file that has a different format than the standard Hadoop 
 configuration file, which makes it easier to specify hierarchical objects 
 like queues and their properties. 
 * With properties like yarn.scheduler.fair.max.assign that are specified in 
 the standard Hadoop configuration format.
 The standard and default way of configuring it is to use fair-scheduler.xml 
 as the allocations file and to put the yarn.scheduler properties in 
 yarn-site.xml.
 It is also possible to specify a different file as the allocations file, and 
 to place the yarn.scheduler properties in fair-scheduler.xml, which will be 
 interpreted as in the standard Hadoop configuration format.  This flexibility 
 is both confusing and unnecessary.
 Additionally, the allocation file is loaded as fair-scheduler.xml from the 
 classpath if it is not specified, but is loaded as a File if it is.  This 
 causes two problems
 1. We see different behavior when not setting the 
 yarn.scheduler.fair.allocation.file, and setting it to fair-scheduler.xml, 
 which is its default.
 2. Classloaders may choose to cache resources, which can break the reload 
 logic when yarn.scheduler.fair.allocation.file is not specified.
 We should never allow the yarn.scheduler properties to go into 
 fair-scheduler.xml.  And we should always load the allocations file as a 
 file, not as a resource on the classpath.  To preserve existing behavior and 
 allow loading files from the classpath, we can look for files on the 
 classpath, but strip of their scheme and interpret them as Files.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1262) TestApplicationCleanup relies on all containers assigned in a single heartbeat


[ 
https://issues.apache.org/jira/browse/YARN-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783828#comment-13783828
 ] 

Hudson commented on YARN-1262:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/350/])
YARN-1262. TestApplicationCleanup relies on all containers assigned in a single 
heartbeat (Karthik Kambatla via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528243)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationCleanup.java


 TestApplicationCleanup relies on all containers assigned in a single heartbeat
 --

 Key: YARN-1262
 URL: https://issues.apache.org/jira/browse/YARN-1262
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Fix For: 2.1.2-beta

 Attachments: yarn-1262-1.patch


 TestApplicationCleanup submits container requests and waits for allocations 
 to come in.  It only sends a single node heartbeat to the node, expecting 
 multiple containers to be assigned on this heartbeat, which not all 
 schedulers do by default.
 This is causing the test to fail when run with the Fair Scheduler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1260) RM_HOME link breaks when webapp.https.address related properties are not specified


[ 
https://issues.apache.org/jira/browse/YARN-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13783830#comment-13783830
 ] 

Hudson commented on YARN-1260:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #350 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/350/])
YARN-1260. Added webapp.http.address to yarn-default.xml so that default 
install with https enabled doesn't have broken link on NM UI. Contributed by 
Omkar Vinit Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528312)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


 RM_HOME link breaks when webapp.https.address related properties are not 
 specified
 --

 Key: YARN-1260
 URL: https://issues.apache.org/jira/browse/YARN-1260
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 2.1.2-beta
Reporter: Yesha Vora
Assignee: Omkar Vinit Joshi
 Fix For: 2.1.2-beta

 Attachments: YARN-1260.20131030.1.patch


 This issue happens in multiple node cluster where resource manager and node 
 manager are running on different machines.
 Steps to reproduce:
 1) set yarn.resourcemanager.hostname = resourcemanager host in yarn-site.xml
 2) set hadoop.ssl.enabled = true in core-site.xml
 3) Do not specify below property in yarn-site.xml
 yarn.nodemanager.webapp.https.address and 
 yarn.resourcemanager.webapp.https.address
 Here, the default value of above two property will be considered.
 4) Go to nodemanager web UI https://nodemanager host:8044/node
 5) Click on RM_HOME link 
 This link redirects to https://nodemanager host:8090/cluster instead 
 https://resourcemanager host:8090/cluster



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-02 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784012#comment-13784012
 ] 

Wangda Tan commented on YARN-1197:
--

To [~tucu00],
I think the heap size change (Xmx, etc.) of running JVM based container is not 
totally related to this topic. If user want to change a JVM-based container 
size, he/she may use a watcher process launch the worker process in a 
container, and relaunch the worker process with different JVM parameters if 
needed.
In a word, if we cannot solve this in language side, we can solve it in 
application side.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-02 Thread Wangda Tan (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784061#comment-13784061
]

Wangda Tan commented on YARN-1197:
--

To [~bikassaha],
Thanks for your comments, see my opinions below,

{quote}
Still thinking through the RM-NM interactions. The request for change should
probably be a new object that is basically a map of (containerId, Resource)
where Resource is new value for the existing containerId. Not quite sure how we
would use the new container token for a running container since its only used
in start container.
{quote}

Agree, we need to update interface of YarnScheduler.allocate to accept this as
a paramter if we make request for change independent.
And as you mentioned below, we can use the new token to update NM's resource
monitoring limitations of containers.

{quote}
If we wait for RM to sync with NM about the increased resources then it might
be too slow since this happens on a heartbeat and the heartbeat interval can be
in the order of seconds. An alternative would be a new NM API to allow AM's to
increase resources and this would be signed with new container token. But this
would burden the AMs by requiring them to make that additional call.
{quote}

Agree, this is much more time-effective than RM-NM communications. Yes, it's a
cost for both AM/NM for changing container size, but AM should be
self-discipline not do this too frequent.

{quote}
There could be a race between a new container token coming in with increased
resources for an acquired container and the old container token being used by
the NMClient to launch the container (in case the AM decides to launch the
smaller container while it was waiting for an increase).
{quote}

Hmmm... thanks for reminding, this is really a problem. I find another issue is
AM may lie to RM/NM about resource usage, AM can
1) allocate a big container, launch it
2) ask for decrease the container, RM released resource in corresponding
node/application
3) but AM doesn't tell NM about this decrease, it can still use resource before
releasing in the container

I don't have a good idea to solve such problem now. Hope to get more idea from
you about this, I will think it through as well.

Support changing resources of an allocated container

Key: YARN-1197
URL: https://issues.apache.org/jira/browse/YARN-1197
Project: Hadoop YARN
Issue Type: Task
Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
Attachments: yarn-1197.pdf

Currently, YARN cannot support merge several containers in one node to a big
container, which can make us incrementally ask resources, merge them to a
bigger one, and launch our processes. The user scenario is described in the
comments.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-677) Increase coverage to FairScheduler

2013-10-02 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-677:
-

Summary: Increase coverage to FairScheduler  (was: Add test methods in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-677) Increase coverage to FairScheduler

2013-10-02 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784086#comment-13784086
 ] 

Jonathan Eagles commented on YARN-677:
--

+1. Thanks for the coverage addition for this component.

 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-677) Increase coverage to FairScheduler


[ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784104#comment-13784104
 ] 

Hudson commented on YARN-677:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4516 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4516/])
YARN-677. Increase coverage to FairScheduler (Vadim Bondarev and Dennis Y via 
jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528524)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784134#comment-13784134
 ] 

Bikas Saha commented on YARN-1232:
--

Will there be different rm id's for ha and non-ha cases?

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-10-02 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784149#comment-13784149
 ] 

Jonathan Eagles commented on YARN-465:
--

I haven't looked too closely at this, but I see a setAccessible call. This is 
the same technique that powermock uses to access field which has been a 
disalllowed testing technique in the hadoop stack. The reason being that it 
points usually to an improvement that should be made to the class under test.

 fix coverage  org.apache.hadoop.yarn.server.webproxy
 

 Key: YARN-465
 URL: https://issues.apache.org/jira/browse/YARN-465
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-465-branch-0.23-a.patch, 
 YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
 YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch


 fix coverage  org.apache.hadoop.yarn.server.webproxy
 patch YARN-465-trunk.patch for trunk
 patch YARN-465-branch-2.patch for branch-2
 patch YARN-465-branch-0.23.patch for branch-0.23
 There is issue in branch-0.23 . Patch does not creating .keep file.
 For fix it need to run commands:
 mkdir 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
 touch 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty


 [ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1167:


Attachment: YARN-1167.2.patch

Added a test case 

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-10-02 Thread David Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784186#comment-13784186
 ] 

David Yan commented on YARN-658:


Vinod, Thanks again for looking into this.
My custom AM is not trapping any system signal, and I don't have 
yarn.nodemanager.sleep-delay-before-sigkill.ms set.
The thing is, the exact same code works with Ubuntu 12.04, but not 12.10 or 
13.04.

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784199#comment-13784199
 ] 

Andrey Klochkov commented on YARN-445:
--

Bikas, on Windows JVM prints full thread dump on ctrl+break. I think ctrl+c may 
be emulated in the same way and used in place of TERM on Windows, via the same 
signalContainers API.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
 Attachments: YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1167) Submitted distributed shell application shows appMasterHost = empty


[ 
https://issues.apache.org/jira/browse/YARN-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784194#comment-13784194
 ] 

Hadoop QA commented on YARN-1167:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606390/YARN-1167.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2061//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2061//console

This message is automatically generated.

 Submitted distributed shell application shows appMasterHost = empty
 ---

 Key: YARN-1167
 URL: https://issues.apache.org/jira/browse/YARN-1167
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Tassapol Athiapinya
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1167.1.patch, YARN-1167.2.patch


 Submit distributed shell application. Once the application turns to be 
 RUNNING state, app master host should not be empty. In reality, it is empty.
 ==console logs==
 distributedshell.Client: Got application report from ASM for, appId=12, 
 clientToAMToken=null, appDiagnostics=, appMasterHost=, appQueue=default, 
 appMasterRpcPort=0, appStartTime=1378505161360, yarnAppState=RUNNING, 
 distributedFinalState=UNDEFINED, 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-445) Ability to signal containers


 [ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-445:
-

Attachment: YARN-445--n2.patch

fixing javadoc warnings and the failed test

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
 Attachments: YARN-445--n2.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784222#comment-13784222
 ] 

Sandy Ryza commented on YARN-1197:
--

It seems to me that for the reasons Bikas mentioned and for consistency with 
the way container launch is done, the AM should be the one who tells the NM to 
do the resize.  If resources are released, then the NM would tell the RM about 
the newly free space on its next heartbeat after the resize has completed.  
Only then would the scheduler consider those resources available.

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-10-02 Thread Robert Parker (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784282#comment-13784282
 ] 

Robert Parker commented on YARN-658:


looking at https://launchpad.net/ubuntu/+source/procps it appears that 13.10 
will still deploy procps-3.3.3 and the /bin/kill  parameter parsing bug is not 
fixed until 3.3.4.  

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784285#comment-13784285
 ] 

Bikas Saha commented on YARN-1232:
--

Correct. My question is whether there will be 2 different config items to 
specify the logical name for the rm? In ha case its ha.id and in non-ha case 
its rm.id? Or should this jira just use rm.id and not ha.id?

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784284#comment-13784284
 ] 

Hadoop QA commented on YARN-445:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606399/YARN-445--n2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2062//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2062//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2062//console

This message is automatically generated.

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
 Attachments: YARN-445--n2.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-445) Ability to signal containers


[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784292#comment-13784292
 ] 

Andrey Klochkov commented on YARN-445:
--

As I understand this Findbugs warning should be ignored as it's complaining 
about a valid type cast. 

 Ability to signal containers
 

 Key: YARN-445
 URL: https://issues.apache.org/jira/browse/YARN-445
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Jason Lowe
 Attachments: YARN-445--n2.patch, YARN-445.patch


 It would be nice if an ApplicationMaster could send signals to contaniers 
 such as SIGQUIT, SIGUSR1, etc.
 For example, in order to replicate the jstack-on-task-timeout feature 
 implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
 interface for sending SIGQUIT to a container.  For that specific feature we 
 could implement it as an additional field in the StopContainerRequest.  
 However that would not address other potential features like the ability for 
 an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
 latter feature would be a very useful debugging tool for users who do not 
 have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-621) RM triggers web auth failure before first job


 [ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-621:
---

Attachment: YARN-621.20131001.1.patch

 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Omkar Vinit Joshi
Priority: Critical
 Attachments: YARN-621.20131001.1.patch


 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-621) RM triggers web auth failure before first job


[ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784307#comment-13784307
 ] 

Omkar Vinit Joshi commented on YARN-621:


attaching patch...fixed path specs..

 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Omkar Vinit Joshi
Priority: Critical
 Attachments: YARN-621.20131001.1.patch


 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-621) RM triggers web auth failure before first job


[ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784337#comment-13784337
 ] 

Hadoop QA commented on YARN-621:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12606419/YARN-621.20131001.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2063//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2063//console

This message is automatically generated.

 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Omkar Vinit Joshi
Priority: Critical
 Attachments: YARN-621.20131001.1.patch


 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784366#comment-13784366
 ] 

Karthik Kambatla commented on YARN-1232:


I am confused. For Case 1, we need the RMs to have separate ids. For Case 2, 
looks like we need the RMs to have the *same* logical name. If this is correct, 
we need two configs, no? e.g. RM1 and RM2 have ha.id set to rm1, rm2 
respectively, and logical name yarn.resourcemanger.clusterid set to 
yarn-cluster-foo-bar for both RMs. 

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist


[ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784404#comment-13784404
 ] 

Bikas Saha commented on YARN-1141:
--

+1 Looks good except for the following.
This test probably needs to be reversed right?
{code}
+// Verify the blacklist can be updated independent of requesting containers
+cs.allocate(appAttemptId, Collections.ResourceRequestemptyList(),
+Collections.ContainerIdemptyList(), null,
+Collections.singletonList(host));
+Assert.assertFalse(cs.getApplication(appAttemptId).isBlacklisted(host));
+cs.allocate(appAttemptId, Collections.ResourceRequestemptyList(),
+Collections.ContainerIdemptyList(),
+Collections.singletonList(host), null);
+Assert.assertTrue(cs.getApplication(appAttemptId).isBlacklisted(host));
{code}

 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1141.1.patch, YARN-1141.2.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1141) Updating resource requests should be decoupled with updating blacklist

2013-10-02 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1141:
--

Attachment: YARN-1141.3.patch

Updated the test according to Bikas' comment

 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1232) Configuration to support multiple RMs


[ 
https://issues.apache.org/jira/browse/YARN-1232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784465#comment-13784465
 ] 

Alejandro Abdelnur commented on YARN-1232:
--

IMO they are different:

* the HA ids are to differentiate the RM instances within the same cluster. 
This is used by the failover logic only, a user does not need to be aware of 
them when creating applications.

* the clusterID is to differentiate different clusters. This is used by the 
user to indicate against which cluster they want to work.

Because all the configurations a client needs to be aware, as Bikas, Karthik 
and I chatted during the yarn meetup last friday, an easy way to handle this 
would be for a client to specify the yarn-site.xml of the cluster he wants to 
connect or have a nested configurations in the single yarn-site.xml, one per 
cluster.

If we do something like that, then the ids being introduced here will only be 
used for HA.

 Configuration to support multiple RMs
 -

 Key: YARN-1232
 URL: https://issues.apache.org/jira/browse/YARN-1232
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
  Labels: ha
 Attachments: yarn-1232-1.patch, yarn-1232-2.patch, yarn-1232-3.patch, 
 yarn-1232-4.patch, yarn-1232-5.patch, yarn-1232-6.patch


 We should augment the configuration to allow users specify two RMs and the 
 individual RPC addresses for them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist


[ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784466#comment-13784466
 ] 

Hadoop QA commented on YARN-1141:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606443/YARN-1141.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2064//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2064//console

This message is automatically generated.

 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.1.2-beta

 Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1141) Updating resource requests should be decoupled with updating blacklist


[ 
https://issues.apache.org/jira/browse/YARN-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784480#comment-13784480
 ] 

Hudson commented on YARN-1141:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4520 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4520/])
YARN-1141. Updating resource requests should be decoupled with updating 
blacklist (Zhijie Shen via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528632)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSSchedulerApp.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java


 Updating resource requests should be decoupled with updating blacklist
 --

 Key: YARN-1141
 URL: https://issues.apache.org/jira/browse/YARN-1141
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.1.2-beta

 Attachments: YARN-1141.1.patch, YARN-1141.2.patch, YARN-1141.3.patch


 Currently, in CapacityScheduler and FifoScheduler, blacklist is updated 
 together with resource requests, only when the incoming resource requests are 
 not empty. Therefore, when the incoming resource requests are empty, the 
 blacklist will not be updated even when blacklist additions and removals are 
 not empty.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect

Sandy Ryza created YARN-1265:


 Summary: Fair Scheduler chokes on unhealthy node reconnect
 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza


Only nodes in the RUNNING state are tracked by schedulers.  When a node 
reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if it's 
in the RUNNING state.  The FairScheduler doesn't guard against this.

I think the best way to fix this is to check to see whether a node is RUNNING 
before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-10-02 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784489#comment-13784489
 ] 

Ravi Prakash commented on YARN-465:
---

Thanks Aleksey for your contribution. Could you please also update the patch? 
TestWebAppProxyServlet.java could not be compiled with this patch

 fix coverage  org.apache.hadoop.yarn.server.webproxy
 

 Key: YARN-465
 URL: https://issues.apache.org/jira/browse/YARN-465
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-465-branch-0.23-a.patch, 
 YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
 YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk.patch


 fix coverage  org.apache.hadoop.yarn.server.webproxy
 patch YARN-465-trunk.patch for trunk
 patch YARN-465-branch-2.patch for branch-2
 patch YARN-465-branch-0.23.patch for branch-0.23
 There is issue in branch-0.23 . Patch does not creating .keep file.
 For fix it need to run commands:
 mkdir 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
 touch 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-10-02 Thread David Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784485#comment-13784485
 ] 

David Yan commented on YARN-658:


I see.  Will there be workaround fix in YARN to get around this problem?  I 
would imagine more and more users will try to run YARN on ubuntu 12.10, 13.04 
and 13.10.

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-867) Isolation of failures in aux services


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784503#comment-13784503
 ] 

Bikas Saha commented on YARN-867:
-

Probably we can ignore the error here since the container has already failed.
{code}
 // From LOCALIZATION_FAILED State
 .addTransition(ContainerState.LOCALIZATION_FAILED,
@@ -180,6 +184,9 @@ public ContainerImpl(Configuration conf, Dispatcher 
dispatcher,
 .addTransition(ContainerState.LOCALIZATION_FAILED,
 ContainerState.LOCALIZATION_FAILED,
 ContainerEventType.RESOURCE_FAILED)
+.addTransition(ContainerState.LOCALIZATION_FAILED, 
ContainerState.EXITED_WITH_FAILURE,
+ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+new ExitedWithFailureTransition(false))
{code}

Probably have 1 try catch instead of multiple.

Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably 
hasnt failed.

TestAuxService needs an addition for the new code

TestContainer - new test can be made simpler by not mocking AuxServiceHandler 
and instead sending the failed event directly like its done for other tests 
there.

In AuxService.handle(APPLICATION_INIT) and other places like that, where the 
service does not exist then we should fail too.

Zhijie, we should err on the side of caution here and fail the container. If we 
see real use cases where failure can be ignored then we can make that 
improvement.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-425) coverage fix for yarn api


[ 
https://issues.apache.org/jira/browse/YARN-425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784517#comment-13784517
 ] 

Hudson commented on YARN-425:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4521 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4521/])
YARN-425. coverage fix for yarn api (Aleksey Gorshkov via jeagles) (jeagles: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528641)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestResourceManagerAdministrationProtocolPBClientImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestYarnApiClasses.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/resources/config-with-security.xml


 coverage fix for yarn api
 -

 Key: YARN-425
 URL: https://issues.apache.org/jira/browse/YARN-425
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Fix For: 3.0.0, 2.3.0

 Attachments: YARN-425-branch-0.23-d.patch, 
 YARN-425-branch-0.23.patch, YARN-425-branch-0.23-v1.patch, 
 YARN-425-branch-2-b.patch, YARN-425-branch-2-c.patch, 
 YARN-425-branch-2.patch, YARN-425-branch-2-v1.patch, YARN-425-trunk-a.patch, 
 YARN-425-trunk-b.patch, YARN-425-trunk-c.patch, YARN-425-trunk-d.patch, 
 YARN-425-trunk.patch, YARN-425-trunk-v1.patch, YARN-425-trunk-v2.patch


 coverage fix for yarn api
 patch YARN-425-trunk-a.patch for trunk
 patch YARN-425-branch-2.patch for branch-2
 patch YARN-425-branch-0.23.patch for branch-0.23



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work

Mayank Bansal created YARN-1266:
---

 Summary: Adding ApplicationHistoryProtocolPBService to make web 
apps to work
 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal


Maybe we should include AHS classes as well (for developer usage) in yarn and 
yarn.cmd



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Description: Adding ApplicationHistoryProtocolPBService to make web apps to 
work and changing yarn to run AHS as a seprate process

 Adding ApplicationHistoryProtocolPBService to make web apps to work
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal

 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Description: (was: Maybe we should include AHS classes as well (for 
developer usage) in yarn and yarn.cmd)

 Adding ApplicationHistoryProtocolPBService to make web apps to work
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal





--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Attachment: YARN-1266-1.patch

Attaching patch

Thanks,
Mayank

 Adding ApplicationHistoryProtocolPBService to make web apps to work
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-658) Command to kill a YARN application does not work with newer Ubuntu versions

2013-10-02 Thread Robert Parker (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784550#comment-13784550
 ] 

Robert Parker commented on YARN-658:


The problem is reproducible in mvn test 
-Dtest=org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown

I have confirmed that kill version 3.3.8 does not have this problem by 
installing it ahead of /bin/kill and running the above test(no code change 
required).  I had a half baked patch that I will dig up, clean up and post. The 
patch precedes the -PID with a ' -- ' .  I had hoped Ubuntu would fix this 
sooner but that sadly does not appear to be the case.

 Command to kill a YARN application does not work with newer Ubuntu versions
 ---

 Key: YARN-658
 URL: https://issues.apache.org/jira/browse/YARN-658
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.0.3-alpha, 2.0.4-alpha
Reporter: David Yan
 Attachments: AppMaster.stderr, 
 yarn-david-nodemanager-david-ubuntu.out, 
 yarn-david-resourcemanager-david-ubuntu.out


 After issuing a KillApplicationRequest, the application keeps running on the 
 system even though the state is changed to KILLED.  It happens on both Ubuntu 
 12.10 and 13.04, but works fine on Ubuntu 12.04.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1149:


Attachment: YARN-1149.5.patch

Use the readLock and WriteLock to solve the synchronization issue.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) Adding ApplicationHistoryProtocolPBService to make web apps to work


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784559#comment-13784559
 ] 

Hadoop QA commented on YARN-1266:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606466/YARN-1266-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2067//console

This message is automatically generated.

 Adding ApplicationHistoryProtocolPBService to make web apps to work
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy


 [ 
https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-876:


Component/s: (was: nodemanager)
 resourcemanager

 Node resource is added twice when node comes back from unhealthy to healthy
 ---

 Key: YARN-876
 URL: https://issues.apache.org/jira/browse/YARN-876
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: PengZhang
Assignee: PengZhang
 Fix For: 2.1.2-beta

 Attachments: YARN-876.patch


 When an unhealthy restarts, its resource maybe added twice in scheduler.
 First time is at node's reconnection, while node's final state is still 
 UNHEALTHY.
 And second time is at node's update, while node's state changing from 
 UNHEALTHY to HEALTHY.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-876) Node resource is added twice when node comes back from unhealthy to healthy


[ 
https://issues.apache.org/jira/browse/YARN-876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784586#comment-13784586
 ] 

Hudson commented on YARN-876:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4522 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4522/])
YARN-876. Node resource is added twice when node comes back from unhealthy. 
(Peng Zhang via Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528660)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java


 Node resource is added twice when node comes back from unhealthy to healthy
 ---

 Key: YARN-876
 URL: https://issues.apache.org/jira/browse/YARN-876
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: PengZhang
Assignee: PengZhang
 Fix For: 2.1.2-beta

 Attachments: YARN-876.patch


 When an unhealthy restarts, its resource maybe added twice in scheduler.
 First time is at node's reconnection, while node's final state is still 
 UNHEALTHY.
 And second time is at node's update, while node's state changing from 
 UNHEALTHY to HEALTHY.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784589#comment-13784589
 ] 

Hadoop QA commented on YARN-1149:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606467/YARN-1149.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2066//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2066//console

This message is automatically generated.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)

[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784613#comment-13784613
 ] 

Siddharth Seth commented on YARN-1131:
--

[~djp], if you don't mind, I'd like to take this over - would be good to get it 
into the next release.

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Junping Du
Priority: Minor
 Fix For: 2.1.2-beta


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784620#comment-13784620
 ] 

Junping Du commented on YARN-1131:
--

Sure. Sid, I just reassign this JIRA to you. Please feel free to start the 
work. Thanks!

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Fix For: 2.1.2-beta


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1131:
-

Assignee: Siddharth Seth  (was: Junping Du)

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Fix For: 2.1.2-beta


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-890) The roundup for memory values on resource manager UI is misleading


 [ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-890:
-

Attachment: YARN-890.2.patch

[~zjshen] [~xgong] Mind taking a look? I had a different approach in mind to 
fixing the issue. 

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Reopened] (YARN-677) Increase coverage to FairScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza reopened YARN-677:
-


 Increase coverage to FairScheduler
 --

 Key: YARN-677
 URL: https://issues.apache.org/jira/browse/YARN-677
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Vadim Bondarev
 Fix For: 3.0.0, 2.3.0

 Attachments: HADOOP-4536-branch-2-a.patch, 
 HADOOP-4536-branch-2c.patch, HADOOP-4536-trunk-a.patch, 
 HADOOP-4536-trunk-c.patch, HDFS-4536-branch-2--N7.patch, 
 HDFS-4536-branch-2--N8.patch, HDFS-4536-branch-2-N9.patch, 
 HDFS-4536-trunk--N6.patch, HDFS-4536-trunk--N7.patch, 
 HDFS-4536-trunk--N8.patch, HDFS-4536-trunk-N9.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1199) Make NM/RM Versions Available


[ 
https://issues.apache.org/jira/browse/YARN-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784661#comment-13784661
 ] 

Hadoop QA commented on YARN-1199:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606437/YARN-1199.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestJobCleanup
  org.apache.hadoop.yarn.sls.TestSLSRunner

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2065//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2065//console

This message is automatically generated.

 Make NM/RM Versions Available
 -

 Key: YARN-1199
 URL: https://issues.apache.org/jira/browse/YARN-1199
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-1199.patch, YARN-1199.patch, YARN-1199.patch, 
 YARN-1199.patch


 Now as we have the NM and RM Versions available, we can display the YARN 
 version of nodes running in the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1213:
-

Attachment: YARN-1213.patch

 Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair 
 Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect


 [ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1265:
-

Attachment: YARN-1265.patch

 Fair Scheduler chokes on unhealthy node reconnect
 -

 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1265.patch


 Only nodes in the RUNNING state are tracked by schedulers.  When a node 
 reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
 it's in the RUNNING state.  The FairScheduler doesn't guard against this.
 I think the best way to fix this is to check to see whether a node is RUNNING 
 before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1267) Refactor cgroup logic out of LCE into a standalone binary

Alejandro Abdelnur created YARN-1267:


 Summary: Refactor cgroup logic out of LCE into a standalone binary
 Key: YARN-1267
 URL: https://issues.apache.org/jira/browse/YARN-1267
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.1.2-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
 Fix For: 2.3.0


As discussed in YARN-1253 we should consider decoupling cgroups handling from 
the LCE. YARN-3 initially had a proposal on how this could be done, we should 
see if any of that make sense in the current state of things.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784667#comment-13784667
 ] 

Alejandro Abdelnur commented on YARN-1253:
--

Create YARN-1267 to refactor and decouple cgroups from LCE.

Thinking a bit, I agree with Arun about leaving this JIRA out of 
branch-2.1-beta, only trunk/branch-2.

I've reviewed and tested the patch already, I'll wait till Friday noon for 
comments from other reviewers before committing.

 Changes to LinuxContainerExecutor to run containers as a single dedicated 
 user in non-secure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker
 Attachments: YARN-1253.patch.txt


 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect


[ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784669#comment-13784669
 ] 

Sandy Ryza commented on YARN-1265:
--

Attached patch removes the guard against nodes not being in the nodes map in 
CapacityScheduler.removeNode.  With the guard removed and without the other 
changes, TestResourceTrackerService.testReconnect fails.  It also fails without 
the changes when setting the Fair Scheduler as the default scheduler.  With the 
changes, it passes.

 Fair Scheduler chokes on unhealthy node reconnect
 -

 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1265.patch


 Only nodes in the RUNNING state are tracked by schedulers.  When a node 
 reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
 it's in the RUNNING state.  The FairScheduler doesn't guard against this.
 I think the best way to fix this is to check to see whether a node is RUNNING 
 before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784671#comment-13784671
 ] 

Alejandro Abdelnur commented on YARN-1213:
--

+1 pending jenkins

 Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair 
 Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Comment Edited] (YARN-1253) Changes to LinuxContainerExecutor to run containers as a single dedicated user in non-secure mode


[ 
https://issues.apache.org/jira/browse/YARN-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784667#comment-13784667
 ] 

Alejandro Abdelnur edited comment on YARN-1253 at 10/3/13 12:18 AM:


Created YARN-1267 to refactor and decouple cgroups from LCE.

Thinking a bit, I agree with Arun about leaving this JIRA out of 
branch-2.1-beta, only trunk/branch-2.

I've reviewed and tested the patch already, I'll wait till Friday noon for 
comments from other reviewers before committing.


was (Author: tucu00):
Create YARN-1267 to refactor and decouple cgroups from LCE.

Thinking a bit, I agree with Arun about leaving this JIRA out of 
branch-2.1-beta, only trunk/branch-2.

I've reviewed and tested the patch already, I'll wait till Friday noon for 
comments from other reviewers before committing.

 Changes to LinuxContainerExecutor to run containers as a single dedicated 
 user in non-secure mode
 -

 Key: YARN-1253
 URL: https://issues.apache.org/jira/browse/YARN-1253
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: nodemanager
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Roman Shaposhnik
Priority: Blocker
 Attachments: YARN-1253.patch.txt


 When using cgroups we require LCE to be configured in the cluster to start 
 containers. 
 When LCE starts containers as the user that submitted the job. While this 
 works correctly in a secure setup, in an un-secure setup this presents a 
 couple issues:
 * LCE requires all Hadoop users submitting jobs to be Unix users in all nodes
 * Because users can impersonate other users, any user would have access to 
 any local file of other users
 Particularly, the second issue is not desirable as a user could get access to 
 ssh keys of other users in the nodes or if there are NFS mounts, get to other 
 users data outside of the cluster.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading


[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784684#comment-13784684
 ] 

Hadoop QA commented on YARN-890:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606479/YARN-890.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2068//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2068//console

This message is automatically generated.

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784691#comment-13784691
 ] 

Hadoop QA commented on YARN-1213:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606494/YARN-1213.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2070//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2070//console

This message is automatically generated.

 Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair 
 Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1265) Fair Scheduler chokes on unhealthy node reconnect


[ 
https://issues.apache.org/jira/browse/YARN-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784692#comment-13784692
 ] 

Hadoop QA commented on YARN-1265:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606495/YARN-1265.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2069//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2069//console

This message is automatically generated.

 Fair Scheduler chokes on unhealthy node reconnect
 -

 Key: YARN-1265
 URL: https://issues.apache.org/jira/browse/YARN-1265
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.1.1-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1265.patch


 Only nodes in the RUNNING state are tracked by schedulers.  When a node 
 reconnects, RMNodeImpl.ReconnectNodeTransition tries to remove it, even if 
 it's in the RUNNING state.  The FairScheduler doesn't guard against this.
 I think the best way to fix this is to check to see whether a node is RUNNING 
 before telling the scheduler to remove it.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-621) RM triggers web auth failure before first job


[ 
https://issues.apache.org/jira/browse/YARN-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784694#comment-13784694
 ] 

Omkar Vinit Joshi commented on YARN-621:


Today everytime filter is getting called twice because we have defined 
AuthenticationFilter filtering urls as
* /cluster
* /cluster/*
* /ws
* /ws/*
so if we specify http://localhost:8088/cluster; then AuthenticationFilter will 
be called twice. However if we request http://localhost:8088/cluster/cluster; 
then it will be called once. We know the related ticket HADOOP-8830 (error due 
to AuthenticationFilter getting called twice without request containing 
hadoop.auth cookie).

Also we can not remove below code because it is required inside 
WebApp.serverPathSpecs - WebApp.configureServlets()
{code}webapp.addServePathSpec(basePath);{code}
Also all these calls will really matter for the first call only. After that 
once cookie is set then it doesn't matter. We should definitely fix HADOOP-8830.

 RM triggers web auth failure before first job
 -

 Key: YARN-621
 URL: https://issues.apache.org/jira/browse/YARN-621
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.4-alpha
Reporter: Allen Wittenauer
Assignee: Omkar Vinit Joshi
Priority: Critical
 Attachments: YARN-621.20131001.1.patch


 On a secure YARN setup, before the first job is executed, going to the web 
 interface of the resource manager triggers authentication errors.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

Sandy Ryza created YARN-1268:


 Summary: TestFairScheduler.testContinuousScheduling is flaky
 Key: YARN-1268
 URL: https://issues.apache.org/jira/browse/YARN-1268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza


It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1213) Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784695#comment-13784695
 ] 

Sandy Ryza commented on YARN-1213:
--

Test failure is unrelated - TestFairScheduler.testContinuousScheduling appears 
to be flaky.  Filed YARN-1268 for this.

 Add an equivalent of mapred.fairscheduler.allow.undeclared.pools to the Fair 
 Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1219) FSDownload changes file suffix making FileUtil.unTar() throw exception


[ 
https://issues.apache.org/jira/browse/YARN-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784708#comment-13784708
 ] 

Omkar Vinit Joshi commented on YARN-1219:
-

bq. I didn't see anywhere in code to treat the .tmp file differently. If you 
know please let me know. If the original author only used a suffix to make sure 
the name is different than the original file name, it doesn't seem to be worth 
it to add an unnecessary and error-prone rename operations just to keep the 
temporary file name suffix.
No we are not adding new just moving them around. from unpack to here..Ideally 
that rename code should have been present here only. I remember we had a bug to 
remove that .tmp file. But I think it is fine we can go ahead with this patch. 
As it will not break anything else.

 FSDownload changes file suffix making FileUtil.unTar() throw exception
 --

 Key: YARN-1219
 URL: https://issues.apache.org/jira/browse/YARN-1219
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0, 2.1.1-beta, 2.1.2-beta
Reporter: shanyu zhao
Assignee: shanyu zhao
 Fix For: 2.1.2-beta

 Attachments: YARN-1219.patch


 While running a Hive join operation on Yarn, I saw exception as described 
 below. This is caused by FSDownload copy the files into a temp file and 
 change the suffix into .tmp before unpacking it. In unpack(), it uses 
 FileUtil.unTar() which will determine if the file is gzipped by looking at 
 the file suffix:
 {code}
 boolean gzipped = inFile.toString().endsWith(gz);
 {code}
 To fix this problem, we can remove the .tmp in the temp file name.
 Here is the detailed exception:
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:240)
   at org.apache.hadoop.fs.FileUtil.unTarUsingJava(FileUtil.java:676)
   at org.apache.hadoop.fs.FileUtil.unTar(FileUtil.java:625)
   at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:203)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:287)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:50)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-867) Isolation of failures in aux services

[
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784717#comment-13784717
]

Xuan Gong commented on YARN-867:

bq.Probably have 1 try catch instead of multiple.

Fixed. Use only one big try catch block

bq.Can we rename AUXSERVICE_FAIL to AUXSERVICE_ERROR since the service probably
hasnt failed.

Done

bq.TestAuxService needs an addition for the new code

Added a new test case in TestAuxService

bq.TestContainer - new test can be made simpler by not mocking
AuxServiceHandler and instead sending the failed event directly like its done
for other tests there.

Fixed

bq.In AuxService.handle(APPLICATION_INIT) and other places like that, where the
service does not exist then we should fail too.

Done

bq.Probably we can ignore the error here since the container has already failed.

I think we still need this transition. The container can go to
ContainerState.LOCALIZATION_FAILED from new state, and AuxService is triggered
to do the Application_init at that time. If there is any exception, we will
send the ContainerExitEvent with
ContainerEventType.CONTAINER_EXITED_WITH_FAILURE to the Container. And It is
very possible that container will start to process this event when it is in the
LOCALIZATION_FAILED state. So, we should handle it.

Isolation of failures in aux services
--

Key: YARN-867
URL: https://issues.apache.org/jira/browse/YARN-867
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch,
YARN-867.4.patch, YARN-867.sampleCode.2.patch

Today, a malicious application can bring down the NM by sending bad data to a
service. For example, sending data to the ShuffleService such that it results
any non-IOException will cause the NM's async dispatcher to exit as the
service's INIT APP event is not handled properly.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-867) Isolation of failures in aux services


 [ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-867:
---

Attachment: YARN-867.5.patch

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1213) Restore banning submitting to undeclared pools in the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1213:
-

Summary: Restore banning submitting to undeclared pools in the Fair 
Scheduler  (was: Add an equivalent of 
mapred.fairscheduler.allow.undeclared.pools to the Fair Scheduler)

 Restore banning submitting to undeclared pools in the Fair Scheduler
 

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1213) Restore config to bad submitting to undeclared pools in the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1213:
-

Summary: Restore config to bad submitting to undeclared pools in the Fair 
Scheduler  (was: Restore banning submitting to undeclared pools in the Fair 
Scheduler)

 Restore config to bad submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1213:
-

Summary: Restore config to ban submitting to undeclared pools in the Fair 
Scheduler  (was: Restore config to bad submitting to undeclared pools in the 
Fair Scheduler)

 Restore config to ban submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784726#comment-13784726
 ] 

Sandy Ryza commented on YARN-1213:
--

I just committed this to trunk, branch-2, and branch-2.1-beta

 Restore config to ban submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running

2013-10-02 Thread Siddharth Seth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated YARN-1131:
-

Attachment: YARN-1131.1.txt

Changes in the patch
Adds a YARN application status check based on the ApplicationId, to log a 
correct message if the application is running. If an application is not found 
in the RM - the CLI tool will continue to search for the files on hdfs (RM not 
running, or RM restarted).
Fixes the exception in case of an invalid applicationId.

There's still a case, right after an app completes, but before aggregation is 
complete where an empty output is returned. That should be a separate jira 
though.


 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Fix For: 2.1.2-beta

 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


 [ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1149:


Attachment: YARN-1149.6.patch

add container into failedContainers if try to launch it in the NM shut down 
process

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1213) Restore config to ban submitting to undeclared pools in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784730#comment-13784730
 ] 

Hudson commented on YARN-1213:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4524 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4524/])
YARN-1213. Restore config to ban submitting to undeclared pools in the Fair 
Scheduler. (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1528696)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


 Restore config to ban submitting to undeclared pools in the Fair Scheduler
 --

 Key: YARN-1213
 URL: https://issues.apache.org/jira/browse/YARN-1213
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Fix For: 2.1.2-beta

 Attachments: YARN-1213.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-867) Isolation of failures in aux services


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784732#comment-13784732
 ] 

Hadoop QA commented on YARN-867:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606498/YARN-867.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2071//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2071//console

This message is automatically generated.

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Assigned] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-02 Thread Wei Yan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan reassigned YARN-1268:
-

Assignee: Wei Yan

 TestFairScheduler.testContinuousScheduling is flaky
 ---

 Key: YARN-1268
 URL: https://issues.apache.org/jira/browse/YARN-1268
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Wei Yan

 It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running


 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1131:
--

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-431

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running


 [ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1131:
--

Fix Version/s: (was: 2.1.2-beta)

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784742#comment-13784742
 ] 

Hadoop QA commented on YARN-1149:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606502/YARN-1149.6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2073//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2073//console

This message is automatically generated.

 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at

[jira] [Commented] (YARN-867) Isolation of failures in aux services


[ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784745#comment-13784745
 ] 

Bikas Saha commented on YARN-867:
-

Why is this check needed?
{code}
+  private void handleAuxServiceFail(AuxServicesEvent event, Throwable th) {
+if (event.getType() instanceof AuxServicesEventType) {
+  Container container = event.getContainer();
{code}

If container has already failed then why do we need to change state again? the 
container has already failed.
{code}
+.addTransition(ContainerState.LOCALIZATION_FAILED, 
ContainerState.EXITED_WITH_FAILURE,
+ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+new ExitedWithFailureTransition(false))
{code}
{code}
+.addTransition(ContainerState.CONTAINER_CLEANEDUP_AFTER_KILL,
+ContainerState.EXITED_WITH_FAILURE,
+ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+new ExitedWithFailureTransition(false))
{code}

Why is CONTAINER_EXITED_WITH_FAILURE not being handled while container state is 
localized/running?

Why are extra events being ignored in addition to 
ContainerEventType.CONTAINER_EXITED_WITH_FAILURE?
{code}
+ContainerState.EXITED_WITH_FAILURE,
+EnumSet.of(
+ContainerEventType.KILL_CONTAINER,
+ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+ContainerEventType.RESOURCE_LOCALIZED,
+ContainerEventType.RESOURCE_FAILED,
+ContainerEventType.CONTAINER_LAUNCHED,
+ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS,
+ContainerEventType.CONTAINER_KILLED_ON_REQUEST))
{code}
{code}
+.addTransition(ContainerState.DONE, ContainerState.DONE,
+EnumSet.of(
+ContainerEventType.RESOURCE_LOCALIZED,
+ContainerEventType.CONTAINER_LAUNCHED,
+ContainerEventType.CONTAINER_EXITED_WITH_FAILURE,
+ContainerEventType.CONTAINER_RESOURCES_CLEANEDUP,
+ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS,
+ContainerEventType.CONTAINER_KILLED_ON_REQUEST))
{code}

Can you please check if ExitedWithFailureTransition(true) needs to be called in 
places where the patch is adding ExitedWithFailureTransition(false). Is cleanup 
required?

Do the new tests fail without the changes?

 Isolation of failures in aux services 
 --

 Key: YARN-867
 URL: https://issues.apache.org/jira/browse/YARN-867
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Xuan Gong
Priority: Critical
 Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
 YARN-867.4.patch, YARN-867.5.patch, YARN-867.sampleCode.2.patch


 Today, a malicious application can bring down the NM by sending bad data to a 
 service. For example, sending data to the ShuffleService such that it results 
 any non-IOException will cause the NM's async dispatcher to exit as the 
 service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running


[ 
https://issues.apache.org/jira/browse/YARN-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784748#comment-13784748
 ] 

Hadoop QA commented on YARN-1131:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12606503/YARN-1131.1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

org.apache.hadoop.yarn.client.cli.TestLogsCLI

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2072//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2072//console

This message is automatically generated.

 $ yarn logs should return a message log aggregation is during progress if 
 YARN application is running
 -

 Key: YARN-1131
 URL: https://issues.apache.org/jira/browse/YARN-1131
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Affects Versions: 2.1.1-beta
Reporter: Tassapol Athiapinya
Assignee: Siddharth Seth
Priority: Minor
 Attachments: YARN-1131.1.txt


 In the case when log aggregation is enabled, if a user submits MapReduce job 
 and runs $ yarn logs -applicationId app ID while the YARN application is 
 running, the command will return no message and return user back to shell. It 
 is nice to tell the user that log aggregation is in progress.
 {code}
 -bash-4.1$ /usr/bin/yarn logs -applicationId application_1377900193583_0002
 -bash-4.1$
 {code}
 At the same time, if invalid application ID is given, YARN CLI should say 
 that the application ID is incorrect rather than throwing 
 NoSuchElementException.
 {code}
 $ /usr/bin/yarn logs -applicationId application_0
 Exception in thread main java.util.NoSuchElementException
 at com.google.common.base.AbstractIterator.next(AbstractIterator.java:75)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:124)
 at 
 org.apache.hadoop.yarn.util.ConverterUtils.toApplicationId(ConverterUtils.java:119)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.run(LogDumper.java:110)
 at org.apache.hadoop.yarn.logaggregation.LogDumper.main(LogDumper.java:255)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy


 [ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated YARN-465:
-

Attachment: YARN-465-branch-2--n3.patch
YARN-465-trunk--n3.patch

Attaching updated patches. setAccessible usage is removed.

 fix coverage  org.apache.hadoop.yarn.server.webproxy
 

 Key: YARN-465
 URL: https://issues.apache.org/jira/browse/YARN-465
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
 Attachments: YARN-465-branch-0.23-a.patch, 
 YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
 YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch, 
 YARN-465-trunk--n3.patch, YARN-465-trunk.patch


 fix coverage  org.apache.hadoop.yarn.server.webproxy
 patch YARN-465-trunk.patch for trunk
 patch YARN-465-branch-2.patch for branch-2
 patch YARN-465-branch-0.23.patch for branch-0.23
 There is issue in branch-0.23 . Patch does not creating .keep file.
 For fix it need to run commands:
 mkdir 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
 touch 
 yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

[
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784752#comment-13784752
]

Hadoop QA commented on YARN-465:

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12606508/YARN-465-branch-2--n3.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.

{color:red}-1 javac{color:red}. The patch appears to cause the build to
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2074//console

This message is automatically generated.

fix coverage org.apache.hadoop.yarn.server.webproxy

Key: YARN-465
URL: https://issues.apache.org/jira/browse/YARN-465
Project: Hadoop YARN
Issue Type: Sub-task
Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
Reporter: Aleksey Gorshkov
Assignee: Aleksey Gorshkov
Attachments: YARN-465-branch-0.23-a.patch,
YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch,
YARN-465-branch-2--n3.patch, YARN-465-branch-2.patch, YARN-465-trunk-a.patch,
YARN-465-trunk--n3.patch, YARN-465-trunk.patch

fix coverage org.apache.hadoop.yarn.server.webproxy
patch YARN-465-trunk.patch for trunk
patch YARN-465-branch-2.patch for branch-2
patch YARN-465-branch-0.23.patch for branch-0.23
There is issue in branch-0.23 . Patch does not creating .keep file.
For fix it need to run commands:
mkdir
yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
touch
yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784785#comment-13784785
 ] 

Hitesh Shah commented on YARN-1149:
---

Comments:

  - use proper javadoc notation for Reason enum 
  - CMgrCompletedContainersEvent should have final member vars
  - NodeStatusUpdaterImpl.java: use !appsToCleanup.isEmpty() instead 
appsToCleanup.size() != 0
  - ContainerManagerImpl#cleanUpApplications - shouldn't an invalid event type 
trigger a fatal error? 



 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-10-02 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784803#comment-13784803
 ] 

Wangda Tan commented on YARN-1197:
--

Method mentioned by [~sandyr] can solve cheating problem in AM side. AM 
doesn't need tell RM to decrease container size at all, just tell NM, and let 
NM telling RM by heartbeat, the only problem is order of second latency to 
decrease in scheduler side, shouldn't be a big problem.
And the race problem mentioned by [~bikassaha], I think it should be 
reasonable, when an AM request for more resource on a container, it never know 
when RM will return it back. So AM may need to use the smaller resource to 
launch container, this is not harmful to either scheduler or NM (use less 
resource is not a problem). After sometime, AM will get allocated resource, it 
can tell NM and child process to increase memory quota. Do you agree about this?

 Support changing resources of an allocated container
 

 Key: YARN-1197
 URL: https://issues.apache.org/jira/browse/YARN-1197
 Project: Hadoop YARN
  Issue Type: Task
  Components: api, nodemanager, resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Wangda Tan
 Attachments: yarn-1197.pdf


 Currently, YARN cannot support merge several containers in one node to a big 
 container, which can make us incrementally ask resources, merge them to a 
 bigger one, and launch our processes. The user scenario is described in the 
 comments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784808#comment-13784808
 ] 

Xuan Gong commented on YARN-1149:
-

bq.ContainerManagerImpl#cleanUpApplications - shouldn't an invalid event type 
trigger a fatal error?

Why ? NodeManagerEventType is the enum type. We can not subclass it. How can 
the event type be invalid ? If in the future, we add more event type into 
NodeManagerEventType, we should also add the method to handle it.



 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-890) The roundup for memory values on resource manager UI is misleading

2013-10-02 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784811#comment-13784811
 ] 

Zhijie Shen commented on YARN-890:
--

+1 for the new approach. We shouldn't round up the available resource

 The roundup for memory values on resource manager UI is misleading
 --

 Key: YARN-890
 URL: https://issues.apache.org/jira/browse/YARN-890
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Trupti Dhavle
Assignee: Xuan Gong
 Attachments: Screen Shot 2013-07-10 at 10.43.34 AM.png, 
 YARN-890.1.patch, YARN-890.2.patch


 From the yarn-site.xml, I see following values-
 property
 nameyarn.nodemanager.resource.memory-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.maximum-allocation-mb/name
 value4192/value
 /property
 property
 nameyarn.scheduler.minimum-allocation-mb/name
 value1024/value
 /property
 However the resourcemanager UI shows total memory as 5MB 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING


[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13784823#comment-13784823
 ] 

Hitesh Shah commented on YARN-1149:
---

[~xgong] My comment was in reference to:

+  default:
+LOG.warn(Invalid eventType:  + eventType);
+}


 NM throws InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 -

 Key: YARN-1149
 URL: https://issues.apache.org/jira/browse/YARN-1149
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ramya Sunil
Assignee: Xuan Gong
 Fix For: 2.1.2-beta

 Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
 YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch


 When nodemanager receives a kill signal when an application has finished 
 execution but log aggregation has not kicked in, 
 InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
 {noformat}
 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
 finished : application_1377459190746_0118
 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
 log-file for app application_1377459190746_0118 at 
 /app-logs/foo/logs/application_1377459190746_0118/host_45454.tmp
 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
 (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
 to complete for application_1377459190746_0118
 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
 container container_1377459190746_0118_01_04. Current good log dirs are 
 /tmp/yarn/local
 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
 (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
 log-file for app application_1377459190746_0118
 2013-08-25 20:45:00,925 WARN  application.Application 
 (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 APPLICATION_LOG_HANDLING_FINISHED at RUNNING
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
  
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:689)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)   
 at java.lang.Thread.run(Thread.java:662)
 2013-08-25 20:45:00,926 INFO  application.Application 
 (ApplicationImpl.java:handle(430)) - Application 
 application_1377459190746_0118 transitioned from RUNNING to null
 2013-08-25 20:45:00,927 WARN  monitor.ContainersMonitorImpl 
 (ContainersMonitorImpl.java:run(463)) - 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
  is interrupted. Exiting.
 2013-08-25 20:45:00,938 INFO  ipc.Server (Server.java:stop(2437)) - Stopping 
 server on 8040
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1131) $ yarn logs should return a message log aggregation is during progress if YARN application is running