date:20140109


[ 
https://issues.apache.org/jira/browse/YARN-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866471#comment-13866471
 ] 

Hadoop QA commented on YARN-1578:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622140/YARN-1578.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2837//console

This message is automatically generated.

 Fix how to handle ApplicationHistory about the container
 

 Key: YARN-1578
 URL: https://issues.apache.org/jira/browse/YARN-1578
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: YARN-321
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
 Attachments: YARN-1578.patch, screenshot.png


 I carried out PiEstimator job at Hadoop cluster which applied YARN-321.
 After the job end and when I accessed Web UI of HistoryServer, it displayed 
 500. And HistoryServer daemon log was output as follows.
 {code}
 2014-01-09 13:31:12,227 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
 handling URI: 
 /applicationhistory/appattempt/appattempt_1389146249925_0008_01
 java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 (snip...)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.mergeContainerHistoryData(FileSystemApplicationHistoryStore.java:696)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getContainers(FileSystemApplicationHistoryStore.java:429)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainers(ApplicationHistoryManagerImpl.java:201)
 at 
 org.apache.hadoop.yarn.server.webapp.AppAttemptBlock.render(AppAttemptBlock.java:110)
 (snip...)
 {code}
 I confirmed that there was container which was not finished from 
 ApplicationHistory file.
 In ResourceManager daemon log, ResourceManager reserved this container, but 
 did not allocate it.
 Therefore, about a container which is not allocated, it is necessary to 
 change how to handle in ApplicationHistory.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1577) Unmanaged AM is broken because of YARN-1493

2014-01-09 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1577:
-

Target Version/s: 2.4.0

 Unmanaged AM is broken because of YARN-1493
 ---

 Key: YARN-1577
 URL: https://issues.apache.org/jira/browse/YARN-1577
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Jian He
Assignee: Jian He
Priority: Blocker

 Today unmanaged AM client is waiting for app state to be Accepted to launch 
 the AM. This is broken since we changed in YARN-1493 to start the attempt 
 after the application is Accepted. We may need to introduce an attempt state 
 report that client can rely on to query the attempt state and choose to 
 launch the unmanaged AM.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1470) Add audience annotation to MiniYARNCluster

2014-01-09 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1470:
--

Assignee: (was: Chen He)

 Add audience annotation to MiniYARNCluster
 --

 Key: YARN-1470
 URL: https://issues.apache.org/jira/browse/YARN-1470
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Sandy Ryza
  Labels: newbie

 We should make it clear whether this is a public interface.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1575) Public localizer crashes with Localized unkown resource

2014-01-09 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-1575:
-

Attachment: YARN-1575.patch
YARN-1575.branch-0.23.patch

Attaching a blunt way to solve the race condition, which is to synchronize the 
queueing and update of {{pending}}.  This basically defeats the point of 
{{pending}} being a ConcurrentHashMap, so I updated it to a synchronized map 
since some unit tests are accessing it asynchronously.

For 0.23 we already are synchronizing {{attempts}}, so I piggy-backed the 
synchronization on that variable.

 Public localizer crashes with Localized unkown resource
 -

 Key: YARN-1575
 URL: https://issues.apache.org/jira/browse/YARN-1575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Priority: Critical
 Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch


 The public localizer can crash with the error:
 {noformat}
 2014-01-08 14:11:43,212 [Thread-467] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
 2014-01-08 14:11:43,212 [Thread-467] INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1580) Documentation error regarding container-allocation.expiry-interval-ms

German Florez-Larrahondo created YARN-1580:
--

 Summary: Documentation error regarding 
container-allocation.expiry-interval-ms
 Key: YARN-1580
 URL: https://issues.apache.org/jira/browse/YARN-1580
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
 Environment: CentOS 6.4
Reporter: German Florez-Larrahondo
Priority: Trivial


While trying to control settings related to expiration of tokens for long 
running jobs,based on the documentation ( 
http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml)
 I attempted to increase values for 
yarn.rm.container-allocation.expiry-interval-ms without luck. Looking code 
like YarnConfiguration.java I noticed that in  recent versions all these kind 
of settings now have the prefix yarn.resourcemanager.rm as opposed to 
yarn.rm. So for this specific case the setting of interest is 
yarn.resourcemanager.rm.container-allocation.expiry-interval-ms

I supposed there are other documentation errors similar to this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early

German Florez-Larrahondo created YARN-1581:
--

 Summary: Fair Scheduler: containers that get reserved create 
container token to early
 Key: YARN-1581
 URL: https://issues.apache.org/jira/browse/YARN-1581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: CentOS 6.4
Reporter: German Florez-Larrahondo
Priority: Minor


When using the FairScheduler with some slow servers and long running jobs I hit 
 (what I believe is) the same issue reported on  
https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
about the CapacityScheduler:

In my case, with the FairScheduler I am seeing those corner cases when the 
NodeManager is finally ready to start a container that was reserved but the 
token was already expired. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1575) Public localizer crashes with Localized unkown resource


[ 
https://issues.apache.org/jira/browse/YARN-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866767#comment-13866767
 ] 

Hadoop QA commented on YARN-1575:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622200/YARN-1575.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2838//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2838//console

This message is automatically generated.

 Public localizer crashes with Localized unkown resource
 -

 Key: YARN-1575
 URL: https://issues.apache.org/jira/browse/YARN-1575
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: YARN-1575.branch-0.23.patch, YARN-1575.patch


 The public localizer can crash with the error:
 {noformat}
 2014-01-08 14:11:43,212 [Thread-467] ERROR 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localized unkonwn resource to java.util.concurrent.FutureTask@852e26
 2014-01-08 14:11:43,212 [Thread-467] INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Public cache exiting
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early


 [ 
https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

German Florez-Larrahondo updated YARN-1581:
---

Description: 
When using the FairScheduler with long running jobs  and over-subscribed 
servers I hit  (what I believe is) the same issue reported on  
https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
about the CapacityScheduler:

In my case, with the FairScheduler I am seeing those corner cases when the 
NodeManager is finally ready to start a container that was reserved but the 
token was already expired. 

  was:
When using the FairScheduler with some slow servers and long running jobs I hit 
 (what I believe is) the same issue reported on  
https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
about the CapacityScheduler:

In my case, with the FairScheduler I am seeing those corner cases when the 
NodeManager is finally ready to start a container that was reserved but the 
token was already expired. 


 Fair Scheduler: containers that get reserved create container token to early
 

 Key: YARN-1581
 URL: https://issues.apache.org/jira/browse/YARN-1581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: CentOS 6.4
Reporter: German Florez-Larrahondo
Priority: Minor

 When using the FairScheduler with long running jobs  and over-subscribed 
 servers I hit  (what I believe is) the same issue reported on  
 https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
 about the CapacityScheduler:
 In my case, with the FairScheduler I am seeing those corner cases when the 
 NodeManager is finally ready to start a container that was reserved but the 
 token was already expired. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early


 [ 
https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

German Florez-Larrahondo updated YARN-1581:
---

Attachment: ProcessInfo construction failing.jpg

 Fair Scheduler: containers that get reserved create container token to early
 

 Key: YARN-1581
 URL: https://issues.apache.org/jira/browse/YARN-1581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: CentOS 6.4
Reporter: German Florez-Larrahondo
Priority: Minor
 Attachments: ProcessInfo construction failing.jpg


 When using the FairScheduler with long running jobs  and over-subscribed 
 servers I hit  (what I believe is) the same issue reported on  
 https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
 about the CapacityScheduler:
 In my case, with the FairScheduler I am seeing those corner cases when the 
 NodeManager is finally ready to start a container that was reserved but the 
 token was already expired. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early


 [ 
https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

German Florez-Larrahondo updated YARN-1581:
---

Attachment: (was: ProcessInfo construction failing.jpg)

 Fair Scheduler: containers that get reserved create container token to early
 

 Key: YARN-1581
 URL: https://issues.apache.org/jira/browse/YARN-1581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: CentOS 6.4
Reporter: German Florez-Larrahondo
Priority: Minor

 When using the FairScheduler with long running jobs  and over-subscribed 
 servers I hit  (what I believe is) the same issue reported on  
 https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
 about the CapacityScheduler:
 In my case, with the FairScheduler I am seeing those corner cases when the 
 NodeManager is finally ready to start a container that was reserved but the 
 token was already expired. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well

2014-01-09 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866844#comment-13866844
 ] 

Zhijie Shen commented on YARN-1413:
---

Checked RMContainerImpl again. It's wrong to set  logURL to the aggregated log 
link in LaunchedTransition. It should be kept unchanged, and point to NM 
webpage of showing the log of running container.

This logURL should be updated to the aggregated log link when the container is 
finished. See my //TODO comments in FinishedTransition.

 [YARN-321] AHS WebUI should server aggregated logs as well
 --

 Key: YARN-1413
 URL: https://issues.apache.org/jira/browse/YARN-1413
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Mayank Bansal
 Fix For: YARN-321

 Attachments: YARN-1413-1.patch, YARN-1413-2.patch, YARN-1413-3.patch, 
 YARN-1413-4.patch, YARN-1413-5.patch, YARN-1413-6.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows

2014-01-09 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866848#comment-13866848
]

Chris Nauroth commented on YARN-1138:
-

Hi, Chuan. The patch looks good. It's a similar approach to MAPREDUCE-5442.
Two minor things:
# There is some indentation by 3 spaces instead of 2 around the property in
yarn-default.xml. The indentation was already off in the current code, but
would you mind fixing it as part of this patch?
# It's no longer possible for someone to look at yarn-default.xml and see the
default classpath. In MAPREDUCE-5442, we worked around this by adding extra
documentation in the description field. Would you mind doing the same here?
Here is what we ended up with for {{mapreduce.application.classpath}} in
mapred-default.xml:

{code}
property
descriptionCLASSPATH for MR applications. A comma-separated list
of CLASSPATH entries. If mapreduce.application.framework is set then this
must specify the appropriate classpath for that archive, and the name of
the archive must be present in the classpath.
When this value is empty, the following default CLASSPATH for MR
applications would be used.
For Linux:
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*.
For Windows:
%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/*,
%HADOOP_MAPRED_HOME%/share/hadoop/mapreduce/lib/*.
/description
namemapreduce.application.classpath/name
value/value
/property
{code}

yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which
does not work on Windows
---

Key: YARN-1138
URL: https://issues.apache.org/jira/browse/YARN-1138
Project: Hadoop YARN
Issue Type: Bug
Reporter: Yingda Chen
Assignee: Chuan Liu
Fix For: 3.0.0

Attachments: YARN-1138.patch

yarn-default.xml has yarn.application.classpath entry set to
$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib.
It does not work on Windows which needs to be fixed.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845
 ] 

Alejandro Abdelnur commented on YARN-888:
-

[~vinodvk], 

While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it 
drags non-required dependencies (unless you only put in non-leaf POMs 
dependencies that are common to all the leaf modules).

Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is 
one of the motivations (agree it is an IntelliJ issue, on the other hand the 
change does not affect the project built at all and allows IntelliJ users to 
build/debug from the IDE out of the box without doing funny voodoo).

The other motivation, and IMO is more important, is to clean up the 
dependencies modules like yarn-api and yarn-client have. Restricting them to 
what is used on the client side. Using the dependency:tree and 
dependency:analyze plugins I’ve reduced the 3rd party JARs required by the 
clients significantly. As [~ste...@apache.org] pointed out there is much more 
work we should do in this direction, this is a first non-intrusive baby step in 
that direction.

To give you and idea, before the this patch *hadoop-yarn-api* reports as 
required dependencies by itself:

{code}
 +- org.slf4j:slf4j-api:jar:1.7.5:compile
 +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
 |  \- log4j:log4j:jar:1.2.17:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 |  +- tomcat:jasper-compiler:jar:5.5.23:test
 +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
 +- io.netty:netty:jar:3.6.2.Final:compile
 +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
 +- commons-io:commons-io:jar:2.4:compile
 +- com.google.inject:guice:jar:3.0:compile
 |  +- javax.inject:javax.inject:jar:1:compile
 |  \- aopalliance:aopalliance:jar:1.0:compile
 +- com.sun.jersey:jersey-server:jar:1.9:compile
 |  +- asm:asm:jar:3.2:compile
 |  \- com.sun.jersey:jersey-core:jar:1.9:compile
 +- com.sun.jersey:jersey-json:jar:1.9:compile
 |  +- org.codehaus.jettison:jettison:jar:1.1:compile
 |  |  \- stax:stax-api:jar:1.0.1:compile
 |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
 |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
 |  | \- javax.activation:activation:jar:1.1:compile
 |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed 
from 1.8.3)
 |  \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 
1.8.3)
 \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
{code}

With the patch, the required dependencies by itself are down to:

{code}
 +- commons-lang:commons-lang:jar:2.6:compile
 +- com.google.guava:guava:jar:11.0.2:compile
 |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
 +- commons-logging:commons-logging:jar:1.1.3:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 \- com.google.protobuf:protobuf-java:jar:2.5.0:compile
{code}

Does this address your concerns?

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866845#comment-13866845
 ] 

Alejandro Abdelnur edited comment on YARN-888 at 1/9/14 5:50 PM:
-

[~vinodkv], 

While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it 
drags non-required dependencies (unless you only put in non-leaf POMs 
dependencies that are common to all the leaf modules).

Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is 
one of the motivations (agree it is an IntelliJ issue, on the other hand the 
change does not affect the project built at all and allows IntelliJ users to 
build/debug from the IDE out of the box without doing funny voodoo).

The other motivation, and IMO is more important, is to clean up the 
dependencies modules like yarn-api and yarn-client have. Restricting them to 
what is used on the client side. Using the dependency:tree and 
dependency:analyze plugins I’ve reduced the 3rd party JARs required by the 
clients significantly. As [~ste...@apache.org] pointed out there is much more 
work we should do in this direction, this is a first non-intrusive baby step in 
that direction.

To give you and idea, before the this patch *hadoop-yarn-api* reports as 
required dependencies by itself:

{code}
 +- org.slf4j:slf4j-api:jar:1.7.5:compile
 +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
 |  \- log4j:log4j:jar:1.2.17:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 |  +- tomcat:jasper-compiler:jar:5.5.23:test
 +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
 +- io.netty:netty:jar:3.6.2.Final:compile
 +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
 +- commons-io:commons-io:jar:2.4:compile
 +- com.google.inject:guice:jar:3.0:compile
 |  +- javax.inject:javax.inject:jar:1:compile
 |  \- aopalliance:aopalliance:jar:1.0:compile
 +- com.sun.jersey:jersey-server:jar:1.9:compile
 |  +- asm:asm:jar:3.2:compile
 |  \- com.sun.jersey:jersey-core:jar:1.9:compile
 +- com.sun.jersey:jersey-json:jar:1.9:compile
 |  +- org.codehaus.jettison:jettison:jar:1.1:compile
 |  |  \- stax:stax-api:jar:1.0.1:compile
 |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
 |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
 |  | \- javax.activation:activation:jar:1.1:compile
 |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
 |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:compile (version managed 
from 1.8.3)
 |  \- org.codehaus.jackson:jackson-xc:jar:1.8.8:compile (version managed from 
1.8.3)
 \- com.sun.jersey.contribs:jersey-guice:jar:1.9:compile
{code}

With the patch, the required dependencies by itself are down to:

{code}
 +- commons-lang:commons-lang:jar:2.6:compile
 +- com.google.guava:guava:jar:11.0.2:compile
 |  \- com.google.code.findbugs:jsr305:jar:1.3.9:compile
 +- commons-logging:commons-logging:jar:1.1.3:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 \- com.google.protobuf:protobuf-java:jar:2.5.0:compile
{code}

Does this address your concerns?


was (Author: tucu00):
[~vinodvk], 

While having dependencies in non-leaf POMs may reduce the size of leaf POMs, it 
drags non-required dependencies (unless you only put in non-leaf POMs 
dependencies that are common to all the leaf modules).

Yes IntelliJ seems to get funny with dependencies in non-leaf modules. That is 
one of the motivations (agree it is an IntelliJ issue, on the other hand the 
change does not affect the project built at all and allows IntelliJ users to 
build/debug from the IDE out of the box without doing funny voodoo).

The other motivation, and IMO is more important, is to clean up the 
dependencies modules like yarn-api and yarn-client have. Restricting them to 
what is used on the client side. Using the dependency:tree and 
dependency:analyze plugins I’ve reduced the 3rd party JARs required by the 
clients significantly. As [~ste...@apache.org] pointed out there is much more 
work we should do in this direction, this is a first non-intrusive baby step in 
that direction.

To give you and idea, before the this patch *hadoop-yarn-api* reports as 
required dependencies by itself:

{code}
 +- org.slf4j:slf4j-api:jar:1.7.5:compile
 +- org.slf4j:slf4j-log4j12:jar:1.7.5:compile
 |  \- log4j:log4j:jar:1.2.17:compile
 +- org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
 |  +- tomcat:jasper-compiler:jar:5.5.23:test
 +- com.google.inject.extensions:guice-servlet:jar:3.0:compile
 +- io.netty:netty:jar:3.6.2.Final:compile
 +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
 +- commons-io:commons-io:jar:2.4:compile
 +- com.google.inject:guice:jar:3.0:compile
 |  +- javax.inject:javax.inject:jar:1:compile
 |  \- aopalliance:aopalliance:jar:1.0:compile
 +-

[jira] [Updated] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once


 [ 
https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1574:


Attachment: YARN-1574.4.patch

 When RM transit from Active to Standby, the same eventDispatcher should not 
 be registered more than once
 

 Key: YARN-1574
 URL: https://issues.apache.org/jira/browse/YARN-1574
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, 
 YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch


 Currently, we move rmDispatcher out of ActiveService. But we still register 
 the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when 
 we initiate the ActiveService.
 Almost every time when we transit RM from Active to Standby,  we need to 
 initiate the ActiveService. That means we will register the same event 
 Dispatcher which will cause the same event will be handled several times.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once


[ 
https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866855#comment-13866855
 ] 

Xuan Gong commented on YARN-1574:
-

Thanks for the review. The new patch addresses all the latest comments

 When RM transit from Active to Standby, the same eventDispatcher should not 
 be registered more than once
 

 Key: YARN-1574
 URL: https://issues.apache.org/jira/browse/YARN-1574
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, 
 YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch


 Currently, we move rmDispatcher out of ActiveService. But we still register 
 the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when 
 we initiate the ActiveService.
 Almost every time when we transit RM from Active to Standby,  we need to 
 initiate the ActiveService. That means we will register the same event 
 Dispatcher which will cause the same event will be handled several times.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1567) In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload


[ 
https://issues.apache.org/jira/browse/YARN-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866877#comment-13866877
 ] 

Sandy Ryza commented on YARN-1567:
--

The Fair Scheduler does not have a notion of stopped queues.

While there is a lot in queue definitions and behavior that we can consolidate, 
I don't think that we should use conciliation as a sole basis for stopping new 
features.  Is there a reason that the way the Capacity Scheduler works is 
fundamentally incompatible with changing queues from leaf to parent?  Queues 
have a lot in common once they exist, but the way queues are configured, 
loaded, and managed may be different to the point of irreconcilable between the 
Capacity Scheduler and Fair Scheduler. 

Also, I realized I left out the motivation for this.  The use case behind this 
is the following:
Sally gets a leaf queue created for her division, SallysQueue.  As usage starts 
to ramp up, her teams start complaining that their workloads are stepping on 
each other's toes.  She would like to be able to divide up the resources 
allocated to her division between her teams without restarting the RM. 

 In Fair Scheduler, allow empty queues to change between leaf and parent on 
 allocation file reload
 -

 Key: YARN-1567
 URL: https://issues.apache.org/jira/browse/YARN-1567
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1567-1.patch, YARN-1567.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1576) Allow setting RPC timeout in ApplicationClientProtocolPBClientImpl


[ 
https://issues.apache.org/jira/browse/YARN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866889#comment-13866889
 ] 

Sandy Ryza commented on YARN-1576:
--

This tracks the YARN changes that will be necessary for MAPREDUCE-5707

 Allow setting RPC timeout in ApplicationClientProtocolPBClientImpl
 --

 Key: YARN-1576
 URL: https://issues.apache.org/jira/browse/YARN-1576
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866923#comment-13866923
 ] 

Vinod Kumar Vavilapalli commented on YARN-888:
--

Makes sense now. I think I understand the main trouble point. I think this is 
the one:
{quote} unless you only put in non-leaf POMs dependencies that are common to 
all the leaf modules. {quote}
Today we put any dependency needed by *any* leaf-module in the non-leaf module. 
That clearly adds a lot of burden on leafs that don't need those libs. Am I 
understanding that correctly? If so, yeah, let's go for it. At the end of the 
day, the deps that user sees needs to be as clean as possible.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866925#comment-13866925
 ] 

Vinod Kumar Vavilapalli commented on YARN-888:
--

Further, may be we should also do a hybrid by putting *only* common stuff in 
the non-leaf modules and anything else in the leaves?

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866945#comment-13866945
 ] 

Alejandro Abdelnur commented on YARN-888:
-

You got it. On the hybrid approach, it is quite cumbersome as you would have to 
verify that all children modules use the common dependency to add. IMO, leaving 
the noleaf modules slim will be much easier to handle. Plus we solve the prob 
for IntelliJ IDE users.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (YARN-915) Apps Completed metrics on web UI is not correct after RM restart


 [ 
https://issues.apache.org/jira/browse/YARN-915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He resolved YARN-915.
--

Resolution: Duplicate

 Apps Completed metrics on web UI is not correct after RM restart
 

 Key: YARN-915
 URL: https://issues.apache.org/jira/browse/YARN-915
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: screen shot.png


 negative Apps completed metrics can show on web UI



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-915) Apps Completed metrics on web UI is not correct after RM restart


[ 
https://issues.apache.org/jira/browse/YARN-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866972#comment-13866972
 ] 

Jian He commented on YARN-915:
--

closed as a duplicate of YARN-1166

 Apps Completed metrics on web UI is not correct after RM restart
 

 Key: YARN-915
 URL: https://issues.apache.org/jira/browse/YARN-915
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: screen shot.png


 negative Apps completed metrics can show on web UI



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows

2014-01-09 Thread Chuan Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated YARN-1138:


Attachment: YARN-1138.2.patch

Sound good! Here is new patch that addressed the two issues.

 yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which 
 does not work on Windows
 ---

 Key: YARN-1138
 URL: https://issues.apache.org/jira/browse/YARN-1138
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yingda Chen
Assignee: Chuan Liu
 Fix For: 3.0.0

 Attachments: YARN-1138.2.patch, YARN-1138.patch


 yarn-default.xml has yarn.application.classpath entry set to 
 $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib.
  It does not work on Windows which needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows

2014-01-09 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated YARN-1138:


  Component/s: api
 Target Version/s: 3.0.0, 2.3.0
Affects Version/s: 2.2.0
Fix Version/s: (was: 3.0.0)
 Hadoop Flags: Reviewed

+1 for the patch, pending a Jenkins run with the new version.  I plan to commit 
this later today.

 yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which 
 does not work on Windows
 ---

 Key: YARN-1138
 URL: https://issues.apache.org/jira/browse/YARN-1138
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.2.0
Reporter: Yingda Chen
Assignee: Chuan Liu
 Attachments: YARN-1138.2.patch, YARN-1138.patch


 yarn-default.xml has yarn.application.classpath entry set to 
 $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib.
  It does not work on Windows which needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-45) Scheduler feedback to AM to release containers


 [ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-45:


Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-397)

 Scheduler feedback to AM to release containers
 --

 Key: YARN-45
 URL: https://issues.apache.org/jira/browse/YARN-45
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Chris Douglas
Assignee: Carlo Curino
 Fix For: 2.1.0-beta

 Attachments: YARN-45.1.patch, YARN-45.patch, YARN-45.patch, 
 YARN-45.patch, YARN-45.patch, YARN-45.patch, YARN-45.patch, 
 YARN-45_design_thoughts.pdf


 The ResourceManager strikes a balance between cluster utilization and strict 
 enforcement of resource invariants in the cluster. Individual allocations of 
 containers must be reclaimed- or reserved- to restore the global invariants 
 when cluster load shifts. In some cases, the ApplicationMaster can respond to 
 fluctuations in resource availability without losing the work already 
 completed by that task (MAPREDUCE-4584). Supplying it with this information 
 would be helpful for overall cluster utilization [1]. To this end, we want to 
 establish a protocol for the RM to ask the AM to release containers.
 [1] http://research.yahoo.com/files/yl-2012-003.pdf



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-567:
-

Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-397)

 RM changes to support preemption for FairScheduler and CapacityScheduler
 

 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.1.0-beta

 Attachments: YARN-567.patch, YARN-567.patch


 A common tradeoff in scheduling jobs is between keeping the cluster busy and 
 enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
 takes opposite stance on how to achieve this. 
 The FairScheduler, leverages task-killing to quickly reclaim resources from 
 currently running jobs and redistributing them among new jobs, thus keeping 
 the cluster busy but waste useful work. The CapacityScheduler is typically 
 tuned
 to limit the portion of the cluster used by each queue so that the likelihood 
 of violating capacity is low, thus never wasting work, but risking to keep 
 the cluster underutilized or have jobs waiting to obtain their rightful 
 capacity. 
 By introducing the notion of a work-preserving preemption we can remove this 
 tradeoff.  This requires a protocol for preemption (YARN-45), and 
 ApplicationMasters that can answer to preemption  efficiently (e.g., by 
 saving their intermediate state, this will be posted for MapReduce in a 
 separate JIRA soon), together with a scheduler that can issues preemption 
 requests (discussed in separate JIRAs YARN-568 and YARN-569).
 The changes we track with this JIRA are common to FairScheduler and 
 CapacityScheduler, and are mostly propagation of preemption decisions through 
 the ApplicationMastersService.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-567) RM changes to support preemption for FairScheduler and CapacityScheduler


 [ 
https://issues.apache.org/jira/browse/YARN-567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-567:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 RM changes to support preemption for FairScheduler and CapacityScheduler
 

 Key: YARN-567
 URL: https://issues.apache.org/jira/browse/YARN-567
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.1.0-beta

 Attachments: YARN-567.patch, YARN-567.patch


 A common tradeoff in scheduling jobs is between keeping the cluster busy and 
 enforcing capacity/fairness properties. FairScheduler and CapacityScheduler 
 takes opposite stance on how to achieve this. 
 The FairScheduler, leverages task-killing to quickly reclaim resources from 
 currently running jobs and redistributing them among new jobs, thus keeping 
 the cluster busy but waste useful work. The CapacityScheduler is typically 
 tuned
 to limit the portion of the cluster used by each queue so that the likelihood 
 of violating capacity is low, thus never wasting work, but risking to keep 
 the cluster underutilized or have jobs waiting to obtain their rightful 
 capacity. 
 By introducing the notion of a work-preserving preemption we can remove this 
 tradeoff.  This requires a protocol for preemption (YARN-45), and 
 ApplicationMasters that can answer to preemption  efficiently (e.g., by 
 saving their intermediate state, this will be posted for MapReduce in a 
 separate JIRA soon), together with a scheduler that can issues preemption 
 requests (discussed in separate JIRAs YARN-568 and YARN-569).
 The changes we track with this JIRA are common to FairScheduler and 
 CapacityScheduler, and are mostly propagation of preemption decisions through 
 the ApplicationMastersService.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-569:
-

Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-397)

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.1.0-beta

 Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, 
 YARN-569.1.patch, YARN-569.10.patch, YARN-569.11.patch, YARN-569.2.patch, 
 YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, 
 YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch, 
 preemption.2.patch


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container first) 
 again until necessary or until no containers except the AM container are left,
 # (if not enough) it moves onto unreserve and preempt from the next 
 application. 
 # containers that have been asked to preempt are tracked across executions. 
 If a containers is among the one to be preempted for more than a certain 
 time, the container is moved in a the list of containers to be forcibly 
 killed. 
 Notes:
 (*) at the moment, in order to avoid double-counting of the requests, we only 
 look at the ANY part of pending resource requests, which means we might not 
 preempt on behalf of AMs that ask only for specific locations but not any. 
 (**) The ideal balance state is one in which each queue has at least its 
 guaranteed capacity, and the spare capacity is distributed among queues (that 
 wants some) as a weighted fair share. Where the weighting is based on the 
 guaranteed capacity of a queue, and the function runs to a fix point.  
 Tunables of the ProportionalCapacityPreemptionPolicy:
 # observe-only mode (i.e., log the actions it would take, but behave as 
 read-only)
 # how frequently to run the policy
 # how long to wait between preemption and kill of a container
 # which fraction of the containers I would like to obtain should I preempt

[jira] [Updated] (YARN-569) CapacityScheduler: support for preemption (using a capacity monitor)


 [ 
https://issues.apache.org/jira/browse/YARN-569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-569:
-

Issue Type: Sub-task  (was: Bug)
Parent: YARN-45

 CapacityScheduler: support for preemption (using a capacity monitor)
 

 Key: YARN-569
 URL: https://issues.apache.org/jira/browse/YARN-569
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.1.0-beta

 Attachments: 3queues.pdf, CapScheduler_with_preemption.pdf, 
 YARN-569.1.patch, YARN-569.10.patch, YARN-569.11.patch, YARN-569.2.patch, 
 YARN-569.3.patch, YARN-569.4.patch, YARN-569.5.patch, YARN-569.6.patch, 
 YARN-569.8.patch, YARN-569.9.patch, YARN-569.patch, YARN-569.patch, 
 preemption.2.patch


 There is a tension between the fast-pace reactive role of the 
 CapacityScheduler, which needs to respond quickly to 
 applications resource requests, and node updates, and the more introspective, 
 time-based considerations 
 needed to observe and correct for capacity balance. To this purpose we opted 
 instead of hacking the delicate
 mechanisms of the CapacityScheduler directly to add support for preemption by 
 means of a Capacity Monitor,
 which can be run optionally as a separate service (much like the 
 NMLivelinessMonitor).
 The capacity monitor (similarly to equivalent functionalities in the fairness 
 scheduler) operates running on intervals 
 (e.g., every 3 seconds), observe the state of the assignment of resources to 
 queues from the capacity scheduler, 
 performs off-line computation to determine if preemption is needed, and how 
 best to edit the current schedule to 
 improve capacity, and generates events that produce four possible actions:
 # Container de-reservations
 # Resource-based preemptions
 # Container-based preemptions
 # Container killing
 The actions listed above are progressively more costly, and it is up to the 
 policy to use them as desired to achieve the rebalancing goals. 
 Note that due to the lag in the effect of these actions the policy should 
 operate at the macroscopic level (e.g., preempt tens of containers
 from a queue) and not trying to tightly and consistently micromanage 
 container allocations. 
 - Preemption policy  (ProportionalCapacityPreemptionPolicy): 
 - 
 Preemption policies are by design pluggable, in the following we present an 
 initial policy (ProportionalCapacityPreemptionPolicy) we have been 
 experimenting with.  The ProportionalCapacityPreemptionPolicy behaves as 
 follows:
 # it gathers from the scheduler the state of the queues, in particular, their 
 current capacity, guaranteed capacity and pending requests (*)
 # if there are pending requests from queues that are under capacity it 
 computes a new ideal balanced state (**)
 # it computes the set of preemptions needed to repair the current schedule 
 and achieve capacity balance (accounting for natural completion rates, and 
 respecting bounds on the amount of preemption we allow for each round)
 # it selects which applications to preempt from each over-capacity queue (the 
 last one in the FIFO order)
 # it remove reservations from the most recently assigned app until the amount 
 of resource to reclaim is obtained, or until no more reservations exits
 # (if not enough) it issues preemptions for containers from the same 
 applications (reverse chronological order, last assigned container first) 
 again until necessary or until no containers except the AM container are left,
 # (if not enough) it moves onto unreserve and preempt from the next 
 application. 
 # containers that have been asked to preempt are tracked across executions. 
 If a containers is among the one to be preempted for more than a certain 
 time, the container is moved in a the list of containers to be forcibly 
 killed. 
 Notes:
 (*) at the moment, in order to avoid double-counting of the requests, we only 
 look at the ANY part of pending resource requests, which means we might not 
 preempt on behalf of AMs that ask only for specific locations but not any. 
 (**) The ideal balance state is one in which each queue has at least its 
 guaranteed capacity, and the spare capacity is distributed among queues (that 
 wants some) as a weighted fair share. Where the weighting is based on the 
 guaranteed capacity of a queue, and the function runs to a fix point.  
 Tunables of the ProportionalCapacityPreemptionPolicy:
 # observe-only mode (i.e., log the actions it would take, but behave as 
 read-only)
 # how frequently to run the policy
 # how long to wait between preemption and kill of a container
 # which fraction of the containers I would like to obtain should I preempt 
 (has to

[jira] [Updated] (YARN-568) FairScheduler: support for work-preserving preemption


 [ 
https://issues.apache.org/jira/browse/YARN-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-568:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-45

 FairScheduler: support for work-preserving preemption 
 --

 Key: YARN-568
 URL: https://issues.apache.org/jira/browse/YARN-568
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.1.0-beta

 Attachments: YARN-568-1.patch, YARN-568-2.patch, YARN-568-2.patch, 
 YARN-568.patch, YARN-568.patch


 In the attached patch, we modified  the FairScheduler to substitute its 
 preemption-by-killling with a work-preserving version of preemption (followed 
 by killing if the AMs do not respond quickly enough). This should allows to 
 run preemption checking more often, but kill less often (proper tuning to be 
 investigated).  Depends on YARN-567 and YARN-45, is related to YARN-569.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-780) Expose preemption warnings in AMRMClient


 [ 
https://issues.apache.org/jira/browse/YARN-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-780:
-

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-45

 Expose preemption warnings in AMRMClient
 

 Key: YARN-780
 URL: https://issues.apache.org/jira/browse/YARN-780
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza

 When the scheduler gives feedback on containers that need to be released/will 
 be preempted, this should be passed on to users of AMRMClient and 
 AMRMClientAsync



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-650) User guide for preemption


 [ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-650:
-

Issue Type: Sub-task  (was: Task)
Parent: YARN-45

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.4.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1525) Web UI should redirect to active RM when HA is enabled.


 [ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cindy Li updated YARN-1525:
---

Assignee: Cindy Li  (was: Xuan Gong)

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li

 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


 [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1490:
--

Attachment: YARN-1490.6.patch

New patch added the test for testing reserved container to be killed by the 
previous attempt.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866999#comment-13866999
 ] 

Hadoop QA commented on YARN-1490:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622248/YARN-1490.6.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2841//console

This message is automatically generated.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867004#comment-13867004
 ] 

Cindy Li commented on YARN-1525:


 I've been thinking about the best way to find the current active RM for a 
standby RM. Considering two options now: one is to establish connection to 
zookeeper, the other is to query active/standby state related to YARN-1033.  

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li

 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867002#comment-13867002
 ] 

Karthik Kambatla commented on YARN-1525:


Thinking out loud: 

Given that YARN-1482 enables webapp even in Standby mode, I think there is 
merit to not redirecting to the Active always. How about we redirect only when 
querying applications/nodes etc, and have the About section not redirect? 

 Also, if it is not too much trouble, I would prefer getting in YARN-1033 in 
first so we can verify we aren't messing with that functionality either. I 
should have a patch later today.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li

 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867022#comment-13867022
 ] 

Cindy Li commented on YARN-1525:


@Karthik Kambatla, I'm proposing the following. From a user's perspective, it 
would be convenient to be able to access the current active RM's URL. If a user 
types in the url of a standby RM (which used to be the active one after 
failover), YARN-1033 will show this is a standby node. Then this jira YARN-1525 
will use that information to redirect to the current active RM's webpage, and 
showing a message redirecting to the current active RM: active RM's url, then 
the user can go to that url directly next time. This looks more convenient for 
users.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li

 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-650) User guide for preemption


[ 
https://issues.apache.org/jira/browse/YARN-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867023#comment-13867023
 ] 

Hadoop QA commented on YARN-650:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12582234/Y650-0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2842//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2842//console

This message is automatically generated.

 User guide for preemption
 -

 Key: YARN-650
 URL: https://issues.apache.org/jira/browse/YARN-650
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: documentation
Reporter: Chris Douglas
Priority: Minor
 Fix For: 2.4.0

 Attachments: Y650-0.patch


 YARN-45 added a protocol for the RM to ask back resources. The docs on 
 writing YARN applications should include a section on how to interpret this 
 message.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1138) yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which does not work on Windows


[ 
https://issues.apache.org/jira/browse/YARN-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867108#comment-13867108
 ] 

Hadoop QA commented on YARN-1138:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622244/YARN-1138.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2840//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2840//console

This message is automatically generated.

 yarn.application.classpath is set to point to $HADOOP_CONF_DIR etc., which 
 does not work on Windows
 ---

 Key: YARN-1138
 URL: https://issues.apache.org/jira/browse/YARN-1138
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.2.0
Reporter: Yingda Chen
Assignee: Chuan Liu
 Attachments: YARN-1138.2.patch, YARN-1138.patch


 yarn-default.xml has yarn.application.classpath entry set to 
 $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/,$HADOOP_COMMON_HOME/share/hadoop/common/lib/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib.
  It does not work on Windows which needs to be fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once

[
https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867169#comment-13867169
]

Karthik Kambatla commented on YARN-1574:

The test looks much cleaner now.

I should have thought of this earlier - we should remove the dispatcher added
to the RM's serviceList (through addIfService) and add the new one as well.
Otherwise, we could be creating a memory leak. Not sure if we could add a unit
test for this, but would be nice to make sure there is no such leak using jmap
or something.

I also noticed CompositeService#removeService is broken. I am okay with fixing
that too in the same JIRA or a different JIRA. Either way, our tests should
probably cover that too.

When RM transit from Active to Standby, the same eventDispatcher should not
be registered more than once

Key: YARN-1574
URL: https://issues.apache.org/jira/browse/YARN-1574
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch,
YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch

Currently, we move rmDispatcher out of ActiveService. But we still register
the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when
we initiate the ActiveService.
Almost every time when we transit RM from Active to Standby, we need to
initiate the ActiveService. That means we will register the same event
Dispatcher which will cause the same event will be handled several times.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1041) Protocol changes for RM to bind and notify a restarted AM of existing containers

[
https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867184#comment-13867184
]

Vinod Kumar Vavilapalli commented on YARN-1041:
---

Knowing well this is an early patch, quick review comments
- Document RegisterApplicationMasterResponse.getRunningContainers() with its
semantics. Particularly about allocated, acquired, reserved containers from
previous AM being killed.
- yarn_service_protos.proto: running_containers -
containers_from_previous_attempt ? Running or not may(or may not) change
depending on the implementation. If we do this, we should change the name
everywhere.
- YarnScheduler.getAllRunningContainers(): Rename and also deal with an
app-attempt instead of application? Because that is how scheduler looks at
containers - per attempt.
- Per [YARN-1490
comment|https://issues.apache.org/jira/browse/YARN-1490?focusedCommentId=13866267page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13866267],
we need to take care of NMTokens too.

The code duplication in schedulers is a bigger problem that needs to be
addressed. But it has to start somewhere. I am +1 for starting with a
AbstractYarnScheduler.

Protocol changes for RM to bind and notify a restarted AM of existing
containers

Key: YARN-1041
URL: https://issues.apache.org/jira/browse/YARN-1041
Project: Hadoop YARN
Issue Type: Sub-task
Components: resourcemanager
Affects Versions: 3.0.0
Reporter: Steve Loughran
Assignee: Jian He
Attachments: YARN-1041.1.patch

For long lived containers we don't want the AM to be a SPOF.
When the RM restarts a (failed) AM, it should be given the list of containers
it had already been allocated. the AM should then be able to contact the NMs
to get details on them. NMs would also need to do any binding of the
containers needed to handle a moved/restarted AM.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue

2014-01-09 Thread Thomas Graves (JIRA)

Thomas Graves created YARN-1582:
---

 Summary: Capacity Scheduler: add a maximum-allocation-mb setting 
per queue 
 Key: YARN-1582
 URL: https://issues.apache.org/jira/browse/YARN-1582
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler
Affects Versions: 2.2.0, 0.23.10, 3.0.0
Reporter: Thomas Graves
Assignee: Thomas Graves


We want to allow certain queues to use larger container sizes while limiting 
other queues to smaller container sizes.  Setting it per queue will help 
prevent abuse, help limit the impact of reservations, and allow changes in the 
maximum container size to be rolled out more easily.

One reason this is needed is more application types are becoming available on 
yarn and certain applications require more memory to run efficiently. While we 
want to allow for that we don't want other applications to abuse that and start 
requesting bigger containers then what they really need.  

Note that we could have this based on application type, but that might not be 
totally accurate either since for example you might want to allow certain users 
on MapReduce to use larger containers, while limiting other users of MapReduce 
to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1033) Expose RM active/standby state to Web UI and REST API


 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1033:
---

Summary: Expose RM active/standby state to Web UI and REST API  (was: 
Expose RM active/standby state to web UI and metrics)

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla

 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page.
 Cluster metrics also need this state for monitor.
 Standby RM web services shall refuse client request unless querying for RM 
 state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (YARN-1581) Fair Scheduler: containers that get reserved create container token to early


 [ 
https://issues.apache.org/jira/browse/YARN-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved YARN-1581.
---

Resolution: Duplicate

Duplicate of YARN-1417.

 Fair Scheduler: containers that get reserved create container token to early
 

 Key: YARN-1581
 URL: https://issues.apache.org/jira/browse/YARN-1581
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.2.0
 Environment: CentOS 6.4
Reporter: German Florez-Larrahondo
Priority: Minor

 When using the FairScheduler with long running jobs  and over-subscribed 
 servers I hit  (what I believe is) the same issue reported on  
 https://issues.apache.org/jira/browse/YARN-180 , which talked specifically 
 about the CapacityScheduler:
 In my case, with the FairScheduler I am seeing those corner cases when the 
 NodeManager is finally ready to start a container that was reserved but the 
 token was already expired. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1033) Expose RM active/standby state to Web UI and REST API


 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1033:
---

Attachment: yarn-1033-1.patch

Straight-forward patch that adds the HA state information to ClusterInfo REST 
and Web UI. 

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1033) Expose RM active/standby state to Web UI and REST API


 [ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1033:
---

Description: Both active and standby RM shall expose it's web server and 
show it's current state (active or standby) on web page. Users should be able 
to access this information through the REST API as well.  (was: Both active and 
standby RM shall expose it's web server and show it's current state (active or 
standby) on web page.
Cluster metrics also need this state for monitor.
Standby RM web services shall refuse client request unless querying for RM 
state.)

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1567) In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload


[ 
https://issues.apache.org/jira/browse/YARN-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867219#comment-13867219
 ] 

Vinod Kumar Vavilapalli commented on YARN-1567:
---

bq. I don't think that we should use conciliation as a sole basis for stopping 
new features
I get the use case. Didn't say that we should stop this. If we are doing this 
for FS, I was mentioning that we should do it for CS too assuming the validity 
of the use-case.

bq. The Fair Scheduler does not have a notion of stopped queues.
bq. Queues have a lot in common once they exist, but the way queues are 
configured, loaded, and managed may be different to the point of irreconcilable 
between the Capacity Scheduler and Fair Scheduler. 
We've done the reconciliation in Hadoop-1. Though not in the 1.x line. It 
happened in 0.21/0.22. So I can see how it can be done in Hadoop-2 also.

 In Fair Scheduler, allow empty queues to change between leaf and parent on 
 allocation file reload
 -

 Key: YARN-1567
 URL: https://issues.apache.org/jira/browse/YARN-1567
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1567-1.patch, YARN-1567.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1579) ActiveRMInfoProto fields should be optional


 [ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1579:
---

Attachment: yarn-1579-1.patch

Trivial patch that modifies the fields to optional. 

 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867248#comment-13867248
 ] 

Vinod Kumar Vavilapalli commented on YARN-888:
--

Yeah. I can see that. Either ways, it is a convention. Both are equally 
hard/easy to enforce. May be we should put comments in the pom files about the 
agreed upon convention - the one that the last patch does.

I can give it a try on my setup in the mean while. Tx.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867254#comment-13867254
 ] 

Karthik Kambatla commented on YARN-1033:


The posted patch doesn't add the information to JMX. It is slightly more 
involved and thought we could do it when we need it. 

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1567) In Fair Scheduler, allow empty queues to change between leaf and parent on allocation file reload

[
https://issues.apache.org/jira/browse/YARN-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867262#comment-13867262
]

Sandy Ryza commented on YARN-1567:
--

bq. Didn't say that we should stop this. If we are doing this for FS, I was
mentioning that we should do it for CS too assuming the validity of the
use-case.
Ah, sorry, misunderstood. In that case, I totally agree we should add this in
a way that keeps the semantics consistent across the schedulers.

I just read up on the meaning of QueueState in the Capacity Scheduler. Looks
like we should be able to add the same to the Fair Scheduler.

I think queues should need to be empty, not necessarily stopped, to do a
leaf-parent change. Like the leaf-parent change, changing a queue from
empty to stopped occurs on scheduler configuration reload. I see no reason that
we should require two reloads to turn an empty leaf queue into a parent. We
should of course ensure that we are synchronizing properly so that an app does
not end up getting to submitted to the queue that is now a parent.

In Fair Scheduler, allow empty queues to change between leaf and parent on
allocation file reload
-

Key: YARN-1567
URL: https://issues.apache.org/jira/browse/YARN-1567
Project: Hadoop YARN
Issue Type: Improvement
Components: scheduler
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: YARN-1567-1.patch, YARN-1567.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional


[ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867265#comment-13867265
 ] 

Sandy Ryza commented on YARN-1579:
--

+1

 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional


[ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867267#comment-13867267
 ] 

Hadoop QA commented on YARN-1579:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622280/yarn-1579-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2844//console

This message is automatically generated.

 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867266#comment-13867266
 ] 

Hadoop QA commented on YARN-1033:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622276/yarn-1033-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2843//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2843//console

This message is automatically generated.

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()

2014-01-09 Thread Ted Yu (JIRA)

Ted Yu created YARN-1583:


 Summary: Ineffective state check in FairSchedulerAppsBlock#render()
 Key: YARN-1583
 URL: https://issues.apache.org/jira/browse/YARN-1583
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Starting line 90:
{code}
for (RMApp app : apps.values()) {
  if (reqAppStates != null  !reqAppStates.contains(app.getState())) {
{code}
reqAppStates is of type YarnApplicationState.
app.getState() returns RMAppState.
These are two different enum types.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867280#comment-13867280
 ] 

Karthik Kambatla commented on YARN-1033:


Deployed a psuedo-dist cluster, verified on rm-address/cluster/cluster (About 
page) and rm-address/ws/v1/cluster/info (REST API) that the haState is 
reflected as I toggle between Active and Standby states. 

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1579) ActiveRMInfoProto fields should be optional


 [ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1579:
---

Attachment: yarn-1579-1.patch

Uploading the same patch again. 

 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1579-1.patch, yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867281#comment-13867281
 ] 

Sandy Ryza commented on YARN-1033:
--

+1

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867283#comment-13867283
 ] 

Karthik Kambatla commented on YARN-1033:


Thanks Sandy. I ll commit this tomorrow morning PT if there are no objections. 

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867293#comment-13867293
 ] 

Sandy Ryza commented on YARN-1496:
--

Thought ChangeApplicationQueue would be better because it's shorter, but I see 
what you're saying.  MoveApplicationAcrossQueues sounds find to me.

Uploaded a new patch with MoveApplicationAcrossQueues and marked the APIs as 
Unstable.

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues


 [ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1496:
-

Attachment: YARN-1496-5.patch

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867294#comment-13867294
 ] 

Karthik Kambatla commented on YARN-1525:


I am sorry if I came out wrong, I do see merit to redirecting to the Active. I 
think we should redirect in a way that the HA state information exposed by 
YARN-1033 continues to be available on both Active/ Standby RMs.

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li

 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


 [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1490:
--

Attachment: YARN-1490.7.patch

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-01-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867315#comment-13867315
 ] 

Junping Du commented on YARN-1506:
--

bq. Patch looks good to me mostly, Bikas Saha/ Vinod Kumar Vavilapalli you may 
also want to take a look.
[~bikassaha] and [~vinodkv], would you help to review it? Thanks!

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867323#comment-13867323
 ] 

Hadoop QA commented on YARN-1490:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622297/YARN-1490.7.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2845//console

This message is automatically generated.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867329#comment-13867329
 ] 

Hadoop QA commented on YARN-1496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622295/YARN-1496-5.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2847//console

This message is automatically generated.

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional


[ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867336#comment-13867336
 ] 

Hadoop QA commented on YARN-1579:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622293/yarn-1579-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2846//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2846//console

This message is automatically generated.

 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
 Attachments: yarn-1579-1.patch, yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-01-09 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867355#comment-13867355
 ] 

Bikas Saha commented on YARN-1506:
--

I will get to this by the weekend. Sorry for the delay

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1579) ActiveRMInfoProto fields should be optional


 [ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1579:
---

Priority: Trivial  (was: Major)

 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Attachments: yarn-1579-1.patch, yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1399) Allow users to annotate an application with multiple tags


[ 
https://issues.apache.org/jira/browse/YARN-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867362#comment-13867362
 ] 

Karthik Kambatla commented on YARN-1399:


[~vinodkv], [~sandyr] - do we agree on setting the default scope to ALL and 
have Oozie set it explicitly to OWN? 

 Allow users to annotate an application with multiple tags
 -

 Key: YARN-1399
 URL: https://issues.apache.org/jira/browse/YARN-1399
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Zhijie Shen

 Nowadays, when submitting an application, users can fill the applicationType 
 field to facilitate searching it later. IMHO, it's good to accept multiple 
 tags to allow users to describe their applications in multiple aspects, 
 including the application type. Then, searching by tags may be more efficient 
 for users to reach their desired application collection. It's pretty much 
 like the tag system of online photo/video/music and etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1579) ActiveRMInfoProto fields should be optional

2014-01-09 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867376#comment-13867376
 ] 

Hudson commented on YARN-1579:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4982 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4982/])
YARN-1579. ActiveRMInfoProto fields should be optional (kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1557001)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto


 ActiveRMInfoProto fields should be optional
 ---

 Key: YARN-1579
 URL: https://issues.apache.org/jira/browse/YARN-1579
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Trivial
 Fix For: 2.4.0

 Attachments: yarn-1579-1.patch, yarn-1579-1.patch


 Per discussion on YARN-1568, ActiveRMInfoProto should have optional fields 
 instead of required fields. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867386#comment-13867386
 ] 

Jian He commented on YARN-1490:
---

- New patch got rid of the local flag transferStateFromPreviousAttempt inside 
RMAppAttemptImpl and SchedulerApplication, and notify the 
transferContainersFromPreviousAttempt through AppAddedSchedulerEvent , and 
keepContainersAcrossAttempts through AppRemovedSchedulerEvent. 
- similar things for RMAppAttempt to notify RMApp for transferring the state 
through event.
- Fixed a bug in RMAppAttemptImpl.BaseFinalTransition. Missed to set the flag 
to false in RMAppFailedAttemptEvent if this is the last attempt or unmanagedAM.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, 
 YARN-1490.8.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


 [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1490:
--

Attachment: YARN-1490.9.patch

Fixed one test name in TestRMAppAttemptTransitions

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, 
 YARN-1490.8.patch, YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1496) Protocol additions to allow moving apps between queues


 [ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1496:
-

Attachment: YARN-1496-6.patch

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496-6.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867408#comment-13867408
 ] 

Hadoop QA commented on YARN-1490:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622310/YARN-1490.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2848//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2848//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2848//console

This message is automatically generated.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, 
 YARN-1490.8.patch, YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867412#comment-13867412
 ] 

Vinod Kumar Vavilapalli commented on YARN-888:
--

Playing with the patch.

hadoop-yarn-project's pom.xml has some deps. So this indeed looks like the 
hybrid approach?

I ran dependency:analyze, seeing this for hadoop-yarn-api
{code}
[INFO] --- maven-dependency-plugin:2.2:analyze (default-cli) @ hadoop-yarn-api 
---
[WARNING] Unused declared dependencies found:
[WARNING]org.apache.hadoop:hadoop-common:jar:3.0.0-SNAPSHOT:provided
[WARNING]org.apache.hadoop:hadoop-annotations:jar:3.0.0-SNAPSHOT:compile
{code}

I guess this is what you meant by the following in the pom files:
{code}
!-- 'mvn dependency:analyze' fails to detect use of this dependency --
{code}

If dependency plugin is broken like this, we will have to depend (no pun 
intended) on something else for correctness. We should set up a single node 
cluster atleast to ensure that all is well.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API

2014-01-09 Thread Nemon Lou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867413#comment-13867413
 ] 

Nemon Lou commented on YARN-1033:
-

Thanks Karthik Kambatla .You are really efficient.
+1(non-binding)
Agree that HA state in JMX can be added later in another JIRA when needed.


 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


 [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1490:
--

Attachment: YARN-1490.10.patch

fix the find bug

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, YARN-1490.5.patch, 
 YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1525) Web UI should redirect to active RM when HA is enabled.


[ 
https://issues.apache.org/jira/browse/YARN-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867420#comment-13867420
 ] 

Cindy Li commented on YARN-1525:


Ok. I'll try to accommodate the change in YARN-1033. 

 Web UI should redirect to active RM when HA is enabled.
 ---

 Key: YARN-1525
 URL: https://issues.apache.org/jira/browse/YARN-1525
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Cindy Li

 When failover happens, web UI should redirect to the current active rm.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-01-09 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867424#comment-13867424
 ] 

Junping Du commented on YARN-1506:
--

Thank you, Bikas!

 Replace set resource change on RMNode/SchedulerNode directly with event 
 notification.
 -

 Key: YARN-1506
 URL: https://issues.apache.org/jira/browse/YARN-1506
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, scheduler
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-1506-v1.patch, YARN-1506-v2.patch, 
 YARN-1506-v3.patch, YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch


 According to Vinod's comments on YARN-312 
 (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087),
  we should replace RMNode.setResourceOption() with some resource change event.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


 [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1490:
--

Attachment: YARN-1490.11.patch

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, 
 YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, 
 YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867434#comment-13867434
 ] 

Hadoop QA commented on YARN-1490:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622324/YARN-1490.11.patch
  against trunk revision .

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2852//console

This message is automatically generated.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, 
 YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, 
 YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


[ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867440#comment-13867440
 ] 

Hadoop QA commented on YARN-1490:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622320/YARN-1490.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 15 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2851//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2851//console

This message is automatically generated.

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, YARN-1490.4.patch, 
 YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, YARN-1490.8.patch, 
 YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()

2014-01-09 Thread Bangtao Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867466#comment-13867466
 ] 

Bangtao Zhou commented on YARN-1583:


Line #84 {code}reqAppStates = new 
HashSetRMAppState(appStateStrings.length);{code}
so what do you mean reqAppStates is of type YarnApplicationState.?

 Ineffective state check in FairSchedulerAppsBlock#render()
 --

 Key: YARN-1583
 URL: https://issues.apache.org/jira/browse/YARN-1583
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 Starting line 90:
 {code}
 for (RMApp app : apps.values()) {
   if (reqAppStates != null  !reqAppStates.contains(app.getState())) {
 {code}
 reqAppStates is of type YarnApplicationState.
 app.getState() returns RMAppState.
 These are two different enum types.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1496) Protocol additions to allow moving apps between queues


[ 
https://issues.apache.org/jira/browse/YARN-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867465#comment-13867465
 ] 

Hadoop QA commented on YARN-1496:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622315/YARN-1496-6.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2850//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2850//console

This message is automatically generated.

 Protocol additions to allow moving apps between queues
 --

 Key: YARN-1496
 URL: https://issues.apache.org/jira/browse/YARN-1496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1496-1.patch, YARN-1496-2.patch, YARN-1496-3.patch, 
 YARN-1496-4.patch, YARN-1496-5.patch, YARN-1496-6.patch, YARN-1496.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()

2014-01-09 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867472#comment-13867472
 ] 

Ted Yu commented on YARN-1583:
--

I was looking at this class in trunk.
I don't see the above code snippet.

Which branch are you checking ?

 Ineffective state check in FairSchedulerAppsBlock#render()
 --

 Key: YARN-1583
 URL: https://issues.apache.org/jira/browse/YARN-1583
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 Starting line 90:
 {code}
 for (RMApp app : apps.values()) {
   if (reqAppStates != null  !reqAppStates.contains(app.getState())) {
 {code}
 reqAppStates is of type YarnApplicationState.
 app.getState() returns RMAppState.
 These are two different enum types.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once

[
https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867473#comment-13867473
]

Xuan Gong commented on YARN-1574:
-

bq. I should have thought of this earlier - we should remove the dispatcher
added to the RM's serviceList (through addIfService) and add the new one as
well. Otherwise, we could be creating a memory leak. Not sure if we could add a
unit test for this, but would be nice to make sure there is no such leak using
jmap or something.

fixed

bq.I also noticed CompositeService#removeService is broken. I am okay with
fixing that too in the same JIRA or a different JIRA. Either way, our tests
should probably cover that too.

fixed

When RM transit from Active to Standby, the same eventDispatcher should not
be registered more than once

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once


 [ 
https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1574:


Attachment: YARN-1574.5.patch

 When RM transit from Active to Standby, the same eventDispatcher should not 
 be registered more than once
 

 Key: YARN-1574
 URL: https://issues.apache.org/jira/browse/YARN-1574
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, 
 YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch, YARN-1574.5.patch


 Currently, we move rmDispatcher out of ActiveService. But we still register 
 the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when 
 we initiate the ActiveService.
 Almost every time when we transit RM from Active to Standby,  we need to 
 initiate the ActiveService. That means we will register the same event 
 Dispatcher which will cause the same event will be handled several times.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1033) Expose RM active/standby state to Web UI and REST API


[ 
https://issues.apache.org/jira/browse/YARN-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867484#comment-13867484
 ] 

Xuan Gong commented on YARN-1033:
-

+1 LGTM

 Expose RM active/standby state to Web UI and REST API
 -

 Key: YARN-1033
 URL: https://issues.apache.org/jira/browse/YARN-1033
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.1.0-beta
Reporter: Nemon Lou
Assignee: Karthik Kambatla
 Attachments: yarn-1033-1.patch


 Both active and standby RM shall expose it's web server and show it's current 
 state (active or standby) on web page. Users should be able to access this 
 information through the REST API as well.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'

2014-01-09 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhijie Shen updated YARN-1166:
--

Attachment: YARN-1166.8.patch

Thanks Vinod and Jian for your review. I uploaded a new patch.

bq. One comment other than Jian's. runAppAttempt API can just take in an
ApplicationID? Similarly, finishAppAttempt can just take appId and user.

Refactoring the code accordingly.

bq. @zhijie, last time I checked YARN-915 should be caused by this. If so, can
you add a unit test for the restart scenario. thanks!

The test case for RM restart is added.

YARN 'appsFailed' metric should be of type 'counter'

Key: YARN-1166
URL: https://issues.apache.org/jira/browse/YARN-1166
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch,
YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.8.patch,
YARN-1166.patch

Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of
type 'guage' - which means the exact value will be reported.
All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled)
are all of type 'counter' - meaning Ganglia will use slope to provide deltas
between time-points.
To be consistent, AppsFailed metric should also be of type 'counter'.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies

[
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867488#comment-13867488
]

Alejandro Abdelnur commented on YARN-888:
-

[~vinodkv], thx for taking the time to play with the patch.

bq. hadoop-yarn-project's pom.xml has some deps ...

This was an oversight from end as I've traversed the parent poms starting from
the leafs and the yarn modules do not have hadoop-yarn-project as parent. This
means the dependencies there were not being used. I'm attaching a new patch
removing the dependencies section from the hadoop-yarn-project. Thanks for
catching that.

bq. I guess this was you meant by

Correct. However, I wouldn't say the plugin is broken, but it has limitations
(it cannot detect usage of classes loaded via reflection, it cannot detect use
of constant for primitive types and Strings, etc).

bq. We should set up a single node cluster atleast to ensure that all is well.

The produced TARBALL has the exact same set of JAR files, so I would not expect
this being an issue.

However, just to be safe, I've just did a build with the patch, started
minicluster and run a couple of example jobs.

clean up POM dependencies
-

Key: YARN-888
URL: https://issues.apache.org/jira/browse/YARN-888
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch,
yarn-888-2.patch

Intermediate 'pom' modules define dependencies inherited by leaf modules.
This is causing issues in intellij IDE.
We should normalize the leaf modules like in common, hdfs and tools where all
dependencies are defined in each leaf module and the intermediate 'pom'
module do not define any dependency.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1574) When RM transit from Active to Standby, the same eventDispatcher should not be registered more than once


[ 
https://issues.apache.org/jira/browse/YARN-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867497#comment-13867497
 ] 

Hadoop QA commented on YARN-1574:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622337/YARN-1574.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2853//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2853//console

This message is automatically generated.

 When RM transit from Active to Standby, the same eventDispatcher should not 
 be registered more than once
 

 Key: YARN-1574
 URL: https://issues.apache.org/jira/browse/YARN-1574
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Attachments: YARN-1574.1.patch, YARN-1574.1.patch, YARN-1574.2.patch, 
 YARN-1574.2.patch, YARN-1574.3.patch, YARN-1574.4.patch, YARN-1574.5.patch


 Currently, we move rmDispatcher out of ActiveService. But we still register 
 the Event dispatcher, such as schedulerDispatcher, RMAppEventDispatcher when 
 we initiate the ActiveService.
 Almost every time when we transit RM from Active to Standby,  we need to 
 initiate the ActiveService. That means we will register the same event 
 Dispatcher which will cause the same event will be handled several times.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-888) clean up POM dependencies


[ 
https://issues.apache.org/jira/browse/YARN-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867514#comment-13867514
 ] 

Hadoop QA commented on YARN-888:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622339/YARN-888.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2855//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2855//console

This message is automatically generated.

 clean up POM dependencies
 -

 Key: YARN-888
 URL: https://issues.apache.org/jira/browse/YARN-888
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-888.patch, YARN-888.patch, YARN-888.patch, 
 yarn-888-2.patch


 Intermediate 'pom' modules define dependencies inherited by leaf modules.
 This is causing issues in intellij IDE.
 We should normalize the leaf modules like in common, hdfs and tools where all 
 dependencies are defined in each leaf module and the intermediate 'pom' 
 module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1166) YARN 'appsFailed' metric should be of type 'counter'


[ 
https://issues.apache.org/jira/browse/YARN-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867502#comment-13867502
 ] 

Hadoop QA commented on YARN-1166:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622338/YARN-1166.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2854//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2854//console

This message is automatically generated.

 YARN 'appsFailed' metric should be of type 'counter'
 

 Key: YARN-1166
 URL: https://issues.apache.org/jira/browse/YARN-1166
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Srimanth Gunturi
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: YARN-1166.2.patch, YARN-1166.3.patch, YARN-1166.4.patch, 
 YARN-1166.5.patch, YARN-1166.6.patch, YARN-1166.7.patch, YARN-1166.8.patch, 
 YARN-1166.patch


 Currently in YARN's queue metrics, the cumulative metric 'appsFailed' is of 
 type 'guage' - which means the exact value will be reported. 
 All other cumulative queue metrics (AppsSubmitted, AppsCompleted, AppsKilled) 
 are all of type 'counter' - meaning Ganglia will use slope to provide deltas 
 between time-points.
 To be consistent, AppsFailed metric should also be of type 'counter'. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1583) Ineffective state check in FairSchedulerAppsBlock#render()

2014-01-09 Thread Bangtao Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867556#comment-13867556
 ] 

Bangtao Zhou commented on YARN-1583:


I am checking on **branch-2.2.0**, 
and I just checked on **trunk** a moment ago, the problem you mentioned exists 
indeed
It's my carelessness

 Ineffective state check in FairSchedulerAppsBlock#render()
 --

 Key: YARN-1583
 URL: https://issues.apache.org/jira/browse/YARN-1583
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor

 Starting line 90:
 {code}
 for (RMApp app : apps.values()) {
   if (reqAppStates != null  !reqAppStates.contains(app.getState())) {
 {code}
 reqAppStates is of type YarnApplicationState.
 app.getState() returns RMAppState.
 These are two different enum types.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits


 [ 
https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1490:
--

Attachment: YARN-1490.11.patch

submit the same patch to kick jenkins

 RM should optionally not kill all containers when an ApplicationMaster exits
 

 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-1490.1.patch, YARN-1490.10.patch, 
 YARN-1490.11.patch, YARN-1490.11.patch, YARN-1490.2.patch, YARN-1490.3.patch, 
 YARN-1490.4.patch, YARN-1490.5.patch, YARN-1490.6.patch, YARN-1490.7.patch, 
 YARN-1490.8.patch, YARN-1490.9.patch


 This is needed to enable work-preserving AM restart. Some apps can chose to 
 reconnect with old running containers, some may not want to. This should be 
 an option.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

[
https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867591#comment-13867591
]

Xuan Gong commented on YARN-1410:
-

Posting a patch that outlines the approach. Thanks to the AtMostOnce annotation
and idempotent annotation, we only need to do a little code changes.
* As my previous comment, we will make RM accept the appId in the context. When
the failover happens, we will re-use the old applicationId (assigned by
previous active RM) to submit the applications for current active RM
* Use AtMostOnce and idempotent annotation
** As we discussed in YARN-1521, submitApplication and getNewApplication can
not be idempotent. To make those functions retry, we can add AtMostOnce
annotation. And getApplicationReport can be marked as idempotent.
** I would like to add related annotations for these three apis here, because
I think that this is part of work for this ticket
* This is how application submission works. YarnClient#SubmitApplication +call+
ClientRMService#SubmitApplication +call+ RMAppManager#SubmitApplication
+create+ RMApp and submit the START Event +ReturnBack+
ClientRMService#SubmitApplication +ReturnBack+ YarnClient#SubmitApplication
+CheckingAppStatus+ getApplicationReport +END+
** The failover may happen in any steps or between any steps.
** If failover happens :
*** between the time that YarnClient#SubmitApplication starts and the time that
ClientRMService#SubmitApplication is called. The YarnClient will find the next
active RM, and continue to do where it left.
*** between the time that ClientRMService#SubmitApplication starts and the time
that RMAppManager#SubmitApplication is called. We will restart
ClientRMService#SubmitApplication(re-run it from the first line). At this time,
the application has not been saved in zookeeper yet, so we are fine to restart
ClientRMService#SubmitApplication.
*** between the time that RMAppManager#SubmitApplication starts and the time
that RMApp has been created and START EVENT has been submitted. We will do the
same thing as previous case.
*** after the time that RMApp has been created and START Event has been send
out. If the failover happens, there are several different cases:
after YarnClient got the SubmitApplicationResponse, but state of RMApp has
not been saved in zookeeper yet. If the failover happens, when we try to
getApplicationReport, we will get ApplicationNotFoundException. What I am doing
here is to catch this exception, and call YarnClient#SubmitApplication again.
after YarnClient got the SubmitApplicationResponse, and the state of RMApp
has been saved in zookeeper. If the failover happens, we do not need to do
anything.
before YarnClient got the SubmitApplicationResponse, and state of RMApp
has not been saved in zookeeper yet. If the failover happens, we will restart
ClientRMService#SubmitApplication at very beginning
before YarnClient got the SubmitApplicationResponse, but state of RMApp
has been saved in zookeeper. This is the most tricky case. If the failover
happens here, we will re-run ClientRMService#SubmitApplication at very
beginning. It will try to re-submit the application with the old applicationId.
But since we have already saved this application in zookeeper, we will get a
Application with id already exists exception which is *not* we want.

For the last corner case, [~bikassaha], [~kkambatl] Any suggestions ?

Handle client failover during 2 step client API's like app submission
-

Key: YARN-1410
URL: https://issues.apache.org/jira/browse/YARN-1410
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Xuan Gong
Attachments: YARN-1410-outline.patch, YARN-1410.1.patch

Original Estimate: 48h
Remaining Estimate: 48h

App submission involves
1) creating appId
2) using that appId to submit an ApplicationSubmissionContext to the user.
The client may have obtained an appId from an RM, the RM may have failed
over, and the client may submit the app to the new RM.
Since the new RM has a different notion of cluster timestamp (used to create
app id) the new RM may reject the app submission resulting in unexpected
failure on the client side.
The same may happen for other 2 step client API operations.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (YARN-1410) Handle client failover during 2 step client API's like app submission