[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.11.patch

Attached ver.11 fixed javac warnings, findbug warnings and test failures.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160141#comment-14160141
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673059/YARN-796.node-label.consolidate.11.patch
  against trunk revision 16333b4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 42 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.pipes.TestPipeApplication
  org.apache.hadoop.yarn.api.TestPBImplRecords
  
org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodeLabels
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservationQueue
  
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicy
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestReservations

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5272//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5272//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5272//console

This message is automatically generated.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2625) Problems with CLASSPATH in Job Submission REST API

2014-10-06 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160170#comment-14160170
 ] 

Rohith commented on YARN-2625:
--

While submitting appliclaiton from REST api, it expects CLASSPATH to be added 
by application client. As a support, 
ContainerLaunchContextInfo#setEnvironment() provide to add classpath  
variables. I think adding default values at server side may not feasible.

bq. 2) If any environment variables are used in the CLASSPATH environment 
'value' field, they are evaluated when the values are NULL resulting in bad 
values in the CLASSPATH
This is because cilent JVM does not resolve $HADOOP_COMMON_HOME environment 
variable.It is expected to provide full path when submitting from client.

 Problems with CLASSPATH in Job Submission REST API
 --

 Key: YARN-2625
 URL: https://issues.apache.org/jira/browse/YARN-2625
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Doug Haigh

 There are a couple of issues I have found specifying the CLASSPATH 
 environment variable using the REST API.
 1) In the Java client, the CLASSPATH environment is usually made up of either 
 the value of the yarn.application.classpath in yarn-site.xml value or the 
 default YARN classpath value as defined by 
 YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH. REST API 
 consumers have no method of telling the resource manager to use the default 
 unless they hardcode the default value themselves. If the default ever 
 changes, the code would need to change. 
 2) If any environment variables are used in the CLASSPATH environment 'value' 
 field, they are evaluated when the values are NULL resulting in bad values in 
 the CLASSPATH. For example, if I had hardcoded the CLASSPATH value to the 
 default of $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, 
 $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/lib/* the classpath passed to the 
 application master is 
 :/share/hadoop/common/*:/share/hadoop/common/lib/*:/share/hadoop/hdfs/*:/share/hadoop/hdfs/lib/*:/share/hadoop/yarn/*:/share/hadoop/yarn/lib/*
 These two problems require REST API consumers to always have the fully 
 resolved path defined in the yarn.application.classpath value. If the 
 property is missing or contains environment varaibles, the application 
 created by the REST API will fail due to the CLASSPATH being incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2625) Problems with CLASSPATH in Job Submission REST API

2014-10-06 Thread Doug Haigh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160255#comment-14160255
 ] 

Doug Haigh commented on YARN-2625:
--

Yes, I agree the classpath should be set by the client, but the REST client 
should not have to know the default classpath just as the Java client does not 
need to know it. Just as the REST API resolves {{CPS}} to either {{:}} or 
{{;}} based on the underlying operating system, the REST API could look for 
{{DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH}} and resolve it to 
{{YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH}}.


As for environment variables being resolved, when running a Java client against 
a CDH 5.0.0 cluster, I am able to set the environment to 

{{./*:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*}}

and it works - the environment variables are resolved. Maybe it is the way CDH 
is setting things up, but the path is not fully resolved when the client 
specifies it.

 Problems with CLASSPATH in Job Submission REST API
 --

 Key: YARN-2625
 URL: https://issues.apache.org/jira/browse/YARN-2625
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Doug Haigh

 There are a couple of issues I have found specifying the CLASSPATH 
 environment variable using the REST API.
 1) In the Java client, the CLASSPATH environment is usually made up of either 
 the value of the yarn.application.classpath in yarn-site.xml value or the 
 default YARN classpath value as defined by 
 YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH. REST API 
 consumers have no method of telling the resource manager to use the default 
 unless they hardcode the default value themselves. If the default ever 
 changes, the code would need to change. 
 2) If any environment variables are used in the CLASSPATH environment 'value' 
 field, they are evaluated when the values are NULL resulting in bad values in 
 the CLASSPATH. For example, if I had hardcoded the CLASSPATH value to the 
 default of $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, 
 $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/lib/* the classpath passed to the 
 application master is 
 :/share/hadoop/common/*:/share/hadoop/common/lib/*:/share/hadoop/hdfs/*:/share/hadoop/hdfs/lib/*:/share/hadoop/yarn/*:/share/hadoop/yarn/lib/*
 These two problems require REST API consumers to always have the fully 
 resolved path defined in the yarn.application.classpath value. If the 
 property is missing or contains environment varaibles, the application 
 created by the REST API will fail due to the CLASSPATH being incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-06 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1879:
-
Attachment: YARN-1879.22.patch

Made registerApplicationMaster Idempotent and marked 
registerApplicationMaser/finishApplicationMaster Idempotent.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160419#comment-14160419
 ] 

Hadoop QA commented on YARN-1879:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673093/YARN-1879.22.patch
  against trunk revision ed841dd.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5273//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5273//console

This message is automatically generated.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2625) Problems with CLASSPATH in Job Submission REST API

2014-10-06 Thread Doug Haigh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160455#comment-14160455
 ] 

Doug Haigh commented on YARN-2625:
--

To be honest, I never knew about that option - but that does not get me 
anything more than what I can read from the yarn-site.xml file (although I like 
not having to have those files around). 

It still has the two problems described above because

1) If the {{yarn.application.classpath}} value is not specified, I still have 
no way to know the default classpath
2) If the {{yarn.application.classpath}} value has environment variables in it, 
they still need to be resolved somehow.

If the value returned by that URL was the *resolved* classpath, that would be 
the work.

 Problems with CLASSPATH in Job Submission REST API
 --

 Key: YARN-2625
 URL: https://issues.apache.org/jira/browse/YARN-2625
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Affects Versions: 2.5.1
Reporter: Doug Haigh

 There are a couple of issues I have found specifying the CLASSPATH 
 environment variable using the REST API.
 1) In the Java client, the CLASSPATH environment is usually made up of either 
 the value of the yarn.application.classpath in yarn-site.xml value or the 
 default YARN classpath value as defined by 
 YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH. REST API 
 consumers have no method of telling the resource manager to use the default 
 unless they hardcode the default value themselves. If the default ever 
 changes, the code would need to change. 
 2) If any environment variables are used in the CLASSPATH environment 'value' 
 field, they are evaluated when the values are NULL resulting in bad values in 
 the CLASSPATH. For example, if I had hardcoded the CLASSPATH value to the 
 default of $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, 
 $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, 
 $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/*, 
 $HADOOP_YARN_HOME/share/hadoop/yarn/lib/* the classpath passed to the 
 application master is 
 :/share/hadoop/common/*:/share/hadoop/common/lib/*:/share/hadoop/hdfs/*:/share/hadoop/hdfs/lib/*:/share/hadoop/yarn/*:/share/hadoop/yarn/lib/*
 These two problems require REST API consumers to always have the fully 
 resolved path defined in the yarn.application.classpath value. If the 
 property is missing or contains environment varaibles, the application 
 created by the REST API will fail due to the CLASSPATH being incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-06 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160481#comment-14160481
 ] 

Jason Lowe commented on YARN-2312:
--

Patch looks good overall.  Just two minor nits:

The containerId.getContainerId()  0xffL construct shows up a number of 
times.  Wondering if there should be a utility method on ContainerId to provide 
this value or if the masking constant should be obtainable from ContainerId.

TestTaskAttemptListenerImpl.java has no changes releated to this JIRA, 
including wrapping of lines that are already less than 80 columns.  Not a 
must-fix just seemed like unnecessary extra changes.

[~jianhe] do you have any additional comments?

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
 YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
 YARN-2312.4.patch, YARN-2312.5.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2646) distributed shell tests to use registry

2014-10-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14159835#comment-14159835
 ] 

Karthik Kambatla edited comment on YARN-2646 at 10/6/14 4:48 PM:
-

This is the part of the YARN-913 patch which modified distributed shell to 
(optionally) register itself, and tests to verify that it does this, and that 
app-attempt-id persistent entries are purged when the app finishes


was (Author: ste...@apache.org):
This is the part of the YARn-913 patch which modified distributed shell to 
(optionally) register itself, and tests to verify that it does this, and that 
app-attempt-id persistent entries are purged when the app finishes

 distributed shell  tests to use registry
 -

 Key: YARN-2646
 URL: https://issues.apache.org/jira/browse/YARN-2646
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, resourcemanager
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: YARN-2646-001.patch


 for testing and for an example, the Distributed Shell should create a record 
 for itself in the service registry ... the tests can look for this. This will 
 act as a test for the RM integration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160492#comment-14160492
 ] 

Hadoop QA commented on YARN-2644:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672890/YARN-2644.14.patch
  against trunk revision ed841dd.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5274//console

This message is automatically generated.

 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2644.11.patch, YARN-2644.14.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.12.patch

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: (was: YARN-796.node-label.consolidate.12.patch)

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-06 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated YARN-1051:

Fix Version/s: (was: 3.0.0)
   2.6.0

 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160565#comment-14160565
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6197 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6197/])
Move YARN-1051 to 2.6 (cdouglas: rev 8380ca37237a21638e1bcad0dd0e4c7e9f1a1786)
* hadoop-yarn-project/CHANGES.txt


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2615) ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended fields

2014-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160576#comment-14160576
 ] 

Hudson commented on YARN-2615:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6198 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6198/])
YARN-2615. Changed 
ClientToAMTokenIdentifier/RM(Timeline)DelegationTokenIdentifier to use protobuf 
as payload. Contributed by Junping Du (jianhe: rev 
ea26cc0b4ac02b8af686dfda80f540e5aa70c358)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/AMRMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/proto/test_client_tokens.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/YARNDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/FileSystemRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestClientRMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/RMDelegationTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/ClientToAMTokenIdentifierForTest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/RMDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/TimelineDelegationTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/client/ClientToAMTokenIdentifier.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/NMTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/security/TestClientToAMTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/security/TestYARNTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/pom.xml


 ClientToAMTokenIdentifier and DelegationTokenIdentifier should allow extended 
 fields
 

 Key: YARN-2615
 URL: https://issues.apache.org/jira/browse/YARN-2615
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Attachments: YARN-2615-v2.patch, YARN-2615-v3.patch, 
 YARN-2615-v4.patch, YARN-2615-v5.patch, YARN-2615.patch


 As three TokenIdentifiers get updated in YARN-668, ClientToAMTokenIdentifier 
 and DelegationTokenIdentifier should also be updated in the same way to allow 
 fields get extended in future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2544:
-
Attachment: YARN-2544.patch

 [YARN-796] Common server side PB changes (not include user API PB changes)
 --

 Key: YARN-2544
 URL: https://issues.apache.org/jira/browse/YARN-2544
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2544.patch, YARN-2544.patch, YARN-2544.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2494) [YARN-796] Node label manager API and storage implementations

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2494:
-
Attachment: YARN-2494.patch

 [YARN-796] Node label manager API and storage implementations
 -

 Key: YARN-2494
 URL: https://issues.apache.org/jira/browse/YARN-2494
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, 
 YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, YARN-2494.patch, 
 YARN-2494.patch


 This JIRA includes APIs and storage implementations of node label manager,
 NodeLabelManager is an abstract class used to manage labels of nodes in the 
 cluster, it has APIs to query/modify
 - Nodes according to given label
 - Labels according to given hostname
 - Add/remove labels
 - Set labels of nodes in the cluster
 - Persist/recover changes of labels/labels-on-nodes to/from storage
 And it has two implementations to store modifications
 - Memory based storage: It will not persist changes, so all labels will be 
 lost when RM restart
 - FileSystem based storage: It will persist/recover to/from FileSystem (like 
 HDFS), and all labels and labels-on-nodes will be recovered upon RM restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue

2014-10-06 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-2647:


 Summary: [YARN-796] Add yarn queue CLI to get queue info including 
labels of such queue
 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)

2014-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160603#comment-14160603
 ] 

Wangda Tan commented on YARN-2544:
--

bq. We will also need a GetLabelsOfQueueRequest?
I think we don't need this. Instead, we need add a yarn queue cli to get all 
related info of queue. Open up YARN-2647 to track that changes.

 [YARN-796] Common server side PB changes (not include user API PB changes)
 --

 Key: YARN-2544
 URL: https://issues.apache.org/jira/browse/YARN-2544
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2544.patch, YARN-2544.patch, YARN-2544.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2014-10-06 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160613#comment-14160613
 ] 

Subru Krishnan commented on YARN-1051:
--

Thanks [~chris.douglas] for shepherding us all the way through. Thanks to all 
others (you know who you are :)) who took the time to review and whose 
insightful feedback helped us get this into a much better shape.

 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1707:
-
Fix Version/s: 2.6.0

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Fix For: 2.6.0

 Attachments: YARN-1707.10.patch, YARN-1707.2.patch, 
 YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, 
 YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

Attached updated patch

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2566:

Attachment: (was: YARN-2566.003.patch)

 IOException happen in startLocalizer of DefaultContainerExecutor due to not 
 enough disk space for the first localDir.
 -

 Key: YARN-2566
 URL: https://issues.apache.org/jira/browse/YARN-2566
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
 YARN-2566.002.patch, YARN-2566.003.patch


 startLocalizer in DefaultContainerExecutor will only use the first localDir 
 to copy the token file, if the copy is failed for first localDir due to not 
 enough disk space in the first localDir, the localization will be failed even 
 there are plenty of disk space in other localDirs. We see the following error 
 for this case:
 {code}
 2014-09-13 23:33:25,171 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
 create app directory 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
 java.io.IOException: mkdir of 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,185 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.io.FileNotFoundException: File 
 file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
 does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
   at 
 org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344)
   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,186 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1410663092546_0004_01_01 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 2014-09-13 23:33:25,187 WARN 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
 OPERATION=Container Finished - Failed   TARGET=ContainerImpl
 RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
   APPID=application_1410663092546_0004
 CONTAINERID=container_1410663092546_0004_01_01
 2014-09-13 23:33:25,187 INFO 
 

[jira] [Updated] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2566:

Attachment: YARN-2566.003.patch

 IOException happen in startLocalizer of DefaultContainerExecutor due to not 
 enough disk space for the first localDir.
 -

 Key: YARN-2566
 URL: https://issues.apache.org/jira/browse/YARN-2566
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
 YARN-2566.002.patch, YARN-2566.003.patch


 startLocalizer in DefaultContainerExecutor will only use the first localDir 
 to copy the token file, if the copy is failed for first localDir due to not 
 enough disk space in the first localDir, the localization will be failed even 
 there are plenty of disk space in other localDirs. We see the following error 
 for this case:
 {code}
 2014-09-13 23:33:25,171 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
 create app directory 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
 java.io.IOException: mkdir of 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,185 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.io.FileNotFoundException: File 
 file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
 does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
   at 
 org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344)
   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,186 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1410663092546_0004_01_01 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 2014-09-13 23:33:25,187 WARN 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
 OPERATION=Container Finished - Failed   TARGET=ContainerImpl
 RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
   APPID=application_1410663092546_0004
 CONTAINERID=container_1410663092546_0004_01_01
 2014-09-13 23:33:25,187 INFO 
 

[jira] [Updated] (YARN-1707) Making the CapacityScheduler more dynamic

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1707:
-
Target Version/s:   (was: 3.0.0)

 Making the CapacityScheduler more dynamic
 -

 Key: YARN-1707
 URL: https://issues.apache.org/jira/browse/YARN-1707
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: capacity-scheduler
 Fix For: 2.6.0

 Attachments: YARN-1707.10.patch, YARN-1707.2.patch, 
 YARN-1707.3.patch, YARN-1707.4.patch, YARN-1707.5.patch, YARN-1707.6.patch, 
 YARN-1707.7.patch, YARN-1707.8.patch, YARN-1707.9.patch, YARN-1707.patch


 The CapacityScheduler is a rather static at the moment, and refreshqueue 
 provides a rather heavy-handed way to reconfigure it. Moving towards 
 long-running services (tracked in YARN-896) and to enable more advanced 
 admission control and resource parcelling we need to make the 
 CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
 YARN-1051.
 Concretely this require the following changes:
 * create queues dynamically
 * destroy queues dynamically
 * dynamically change queue parameters (e.g., capacity) 
 * modify refreshqueue validation to enforce sum(child.getCapacity())= 100% 
 instead of ==100%
 We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2312) Marking ContainerId#getId as deprecated

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160653#comment-14160653
 ] 

Jian He commented on YARN-2312:
---

Patch looks good to me too, thanks Jason for reviewing the patch.
bq. Wondering if there should be a utility method on ContainerId to provide 
this value or if the masking constant should be obtainable from ContainerId.
I prefer exposing the constant

 Marking ContainerId#getId as deprecated
 ---

 Key: YARN-2312
 URL: https://issues.apache.org/jira/browse/YARN-2312
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2312-wip.patch, YARN-2312.1.patch, 
 YARN-2312.2-2.patch, YARN-2312.2-3.patch, YARN-2312.2.patch, 
 YARN-2312.4.patch, YARN-2312.5.patch


 {{ContainerId#getId}} will only return partial value of containerId, only 
 sequence number of container id without epoch, after YARN-2229. We should 
 mark {{ContainerId#getId}} as deprecated and use 
 {{ContainerId#getContainerId}} instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160654#comment-14160654
 ] 

zhihai xu commented on YARN-2566:
-

I don't see the problem -1 javac. The patch appears to cause the build to 
fail. in my local build. Restart the Jenkins test.

 IOException happen in startLocalizer of DefaultContainerExecutor due to not 
 enough disk space for the first localDir.
 -

 Key: YARN-2566
 URL: https://issues.apache.org/jira/browse/YARN-2566
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
 YARN-2566.002.patch, YARN-2566.003.patch


 startLocalizer in DefaultContainerExecutor will only use the first localDir 
 to copy the token file, if the copy is failed for first localDir due to not 
 enough disk space in the first localDir, the localization will be failed even 
 there are plenty of disk space in other localDirs. We see the following error 
 for this case:
 {code}
 2014-09-13 23:33:25,171 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
 create app directory 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
 java.io.IOException: mkdir of 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,185 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.io.FileNotFoundException: File 
 file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
 does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
   at 
 org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344)
   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,186 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1410663092546_0004_01_01 transitioned from 
 LOCALIZING to LOCALIZATION_FAILED
 2014-09-13 23:33:25,187 WARN 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
 OPERATION=Container Finished - Failed   TARGET=ContainerImpl
 RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
   

[jira] [Updated] (YARN-1708) Add a public API to reserve resources (part of YARN-1051)

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1708:
-
Fix Version/s: 2.6.0

 Add a public API to reserve resources (part of YARN-1051)
 -

 Key: YARN-1708
 URL: https://issues.apache.org/jira/browse/YARN-1708
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-1708.patch, YARN-1708.patch, YARN-1708.patch, 
 YARN-1708.patch


 This JIRA tracks the definition of a new public API for YARN, which allows 
 users to reserve resources (think of time-bounded queues). This is part of 
 the admission control enhancement proposed in YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1709:
-
Fix Version/s: 2.6.0

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
 YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch


 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1709) Admission Control: Reservation subsystem

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1709:
-
Target Version/s:   (was: 3.0.0)

 Admission Control: Reservation subsystem
 

 Key: YARN-1709
 URL: https://issues.apache.org/jira/browse/YARN-1709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, 
 YARN-1709.patch, YARN-1709.patch, YARN-1709.patch, YARN-1709.patch


 This JIRA is about the key data structure used to track resources over time 
 to enable YARN-1051. The Reservation subsystem is conceptually a plan of 
 how the scheduler will allocate resources over-time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1710) Admission Control: agents to allocate reservation

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1710:
-
Fix Version/s: 2.6.0

 Admission Control: agents to allocate reservation
 -

 Key: YARN-1710
 URL: https://issues.apache.org/jira/browse/YARN-1710
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1710.1.patch, YARN-1710.2.patch, YARN-1710.3.patch, 
 YARN-1710.4.patch, YARN-1710.patch


 This JIRA tracks the algorithms used to allocate a user ReservationRequest 
 coming in from the new reservation API (YARN-1708), in the inventory 
 subsystem (YARN-1709) maintaining the current plan for the cluster. The focus 
 of this agents is to quickly find a solution for the set of contraints 
 provided by the user, and the physical constraints of the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1711:
-
Fix Version/s: 2.6.0

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Fix For: 2.6.0

 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.4.patch, YARN-1711.5.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1711) CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1711:
-
Target Version/s:   (was: 3.0.0)

 CapacityOverTimePolicy: a policy to enforce quotas over time for YARN-1709
 --

 Key: YARN-1711
 URL: https://issues.apache.org/jira/browse/YARN-1711
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations
 Fix For: 2.6.0

 Attachments: YARN-1711.1.patch, YARN-1711.2.patch, YARN-1711.3.patch, 
 YARN-1711.4.patch, YARN-1711.5.patch, YARN-1711.patch


 This JIRA tracks the development of a policy that enforces user quotas (a 
 time-extension of the notion of capacity) in the inventory subsystem 
 discussed in YARN-1709.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2080:
-
Fix Version/s: 2.6.0

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1712) Admission Control: plan follower

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1712:
-
Fix Version/s: 2.6.0

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Fix For: 2.6.0

 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.4.patch, YARN-1712.5.patch, YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an reservation subsystem to the scheduler. As the inventory subsystem store 
 the plan of how the resources should be subdivided, the work we propose in 
 this JIRA realizes such plan by dynamically instructing the CapacityScheduler 
 to add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1712) Admission Control: plan follower

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-1712:
-
Target Version/s:   (was: 3.0.0)

 Admission Control: plan follower
 

 Key: YARN-1712
 URL: https://issues.apache.org/jira/browse/YARN-1712
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
  Labels: reservations, scheduler
 Fix For: 2.6.0

 Attachments: YARN-1712.1.patch, YARN-1712.2.patch, YARN-1712.3.patch, 
 YARN-1712.4.patch, YARN-1712.5.patch, YARN-1712.patch


 This JIRA tracks a thread that continuously propagates the current state of 
 an reservation subsystem to the scheduler. As the inventory subsystem store 
 the plan of how the resources should be subdivided, the work we propose in 
 this JIRA realizes such plan by dynamically instructing the CapacityScheduler 
 to add/remove/resize queues to follow the plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2080) Admission Control: Integrate Reservation subsystem with ResourceManager

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2080:
-
Target Version/s:   (was: 3.0.0)

 Admission Control: Integrate Reservation subsystem with ResourceManager
 ---

 Key: YARN-2080
 URL: https://issues.apache.org/jira/browse/YARN-2080
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, 
 YARN-2080.patch, YARN-2080.patch, YARN-2080.patch, YARN-2080.patch


 This JIRA tracks the integration of Reservation subsystem data structures 
 introduced in YARN-1709 with the YARN RM. This is essentially end2end wiring 
 of YARN-1051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2378) Adding support for moving apps between queues in Capacity Scheduler

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2378:
-
Target Version/s:   (was: 2.6.0)

 Adding support for moving apps between queues in Capacity Scheduler
 ---

 Key: YARN-2378
 URL: https://issues.apache.org/jira/browse/YARN-2378
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
  Labels: capacity-scheduler
 Fix For: 2.6.0

 Attachments: YARN-2378-1.patch, YARN-2378.patch, YARN-2378.patch, 
 YARN-2378.patch, YARN-2378.patch


 As discussed with [~leftnoteasy] and [~jianhe], we are breaking up YARN-1707 
 to smaller patches for manageability. This JIRA will address adding support 
 for moving apps between queues in Capacity Scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2385:
-
Fix Version/s: 2.6.0

 Consider splitting getAppsinQueue to getRunningAppsInQueue + 
 getPendingAppsInQueue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
  Labels: abstractyarnscheduler
 Fix For: 2.6.0


 Currently getAppsinQueue returns both pending  running apps. The purpose of 
 the JIRA is to explore splitting it to getRunningAppsInQueue + 
 getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2385) Consider splitting getAppsinQueue to getRunningAppsInQueue + getPendingAppsInQueue

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2385:
-
Fix Version/s: (was: 2.6.0)

 Consider splitting getAppsinQueue to getRunningAppsInQueue + 
 getPendingAppsInQueue
 --

 Key: YARN-2385
 URL: https://issues.apache.org/jira/browse/YARN-2385
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
  Labels: abstractyarnscheduler

 Currently getAppsinQueue returns both pending  running apps. The purpose of 
 the JIRA is to explore splitting it to getRunningAppsInQueue + 
 getPendingAppsInQueue that will provide more flexibility to callers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2611) Fix jenkins findbugs warning and test case failures for trunk merge patch

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2611:
-
Fix Version/s: 2.6.0

 Fix jenkins findbugs warning and test case failures for trunk merge patch
 -

 Key: YARN-2611
 URL: https://issues.apache.org/jira/browse/YARN-2611
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-2611.patch


 This JIRA is to fix jenkins findbugs warnings and test case failures for 
 trunk merge patch  as [reported | 
 https://issues.apache.org/jira/browse/YARN-1051?focusedCommentId=14148506] in 
 YARN-1051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2611) Fix jenkins findbugs warning and test case failures for trunk merge patch

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2611:
-
Target Version/s:   (was: 3.0.0)

 Fix jenkins findbugs warning and test case failures for trunk merge patch
 -

 Key: YARN-2611
 URL: https://issues.apache.org/jira/browse/YARN-2611
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-2611.patch


 This JIRA is to fix jenkins findbugs warnings and test case failures for 
 trunk merge patch  as [reported | 
 https://issues.apache.org/jira/browse/YARN-1051?focusedCommentId=14148506] in 
 YARN-1051



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2576) Prepare yarn-1051 branch for merging with trunk

2014-10-06 Thread Subru Krishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subru Krishnan updated YARN-2576:
-
Target Version/s:   (was: 3.0.0)

 Prepare yarn-1051 branch for merging with trunk
 ---

 Key: YARN-2576
 URL: https://issues.apache.org/jira/browse/YARN-2576
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.6.0

 Attachments: YARN-2576.patch, YARN-2576.patch


 This JIRA is to track the changes required to ensure branch yarn-1051 is 
 ready to be merged with trunk. This includes fixing any compilation issues, 
 findbug and/or javadoc warning, test cases failures, etc if any.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2544) [YARN-796] Common server side PB changes (not include user API PB changes)

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160673#comment-14160673
 ] 

Hadoop QA commented on YARN-2544:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673138/YARN-2544.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5275//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5275//console

This message is automatically generated.

 [YARN-796] Common server side PB changes (not include user API PB changes)
 --

 Key: YARN-2544
 URL: https://issues.apache.org/jira/browse/YARN-2544
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2544.patch, YARN-2544.patch, YARN-2544.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2566) IOException happen in startLocalizer of DefaultContainerExecutor due to not enough disk space for the first localDir.

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160687#comment-14160687
 ] 

Hadoop QA commented on YARN-2566:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673150/YARN-2566.003.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5276//console

This message is automatically generated.

 IOException happen in startLocalizer of DefaultContainerExecutor due to not 
 enough disk space for the first localDir.
 -

 Key: YARN-2566
 URL: https://issues.apache.org/jira/browse/YARN-2566
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2566.000.patch, YARN-2566.001.patch, 
 YARN-2566.002.patch, YARN-2566.003.patch


 startLocalizer in DefaultContainerExecutor will only use the first localDir 
 to copy the token file, if the copy is failed for first localDir due to not 
 enough disk space in the first localDir, the localization will be failed even 
 there are plenty of disk space in other localDirs. We see the following error 
 for this case:
 {code}
 2014-09-13 23:33:25,171 WARN 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
 create app directory 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
 java.io.IOException: mkdir of 
 /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
   at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
   at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
   at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 2014-09-13 23:33:25,185 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Localizer failed
 java.io.FileNotFoundException: File 
 file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
 does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
   at 
 org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.init(ChecksumFs.java:344)
   at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
   at 
 org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
   at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
 

[jira] [Created] (YARN-2648) need mechanism for updating HDFS delegation tokens associated with container launch contexts

2014-10-06 Thread Jonathan Maron (JIRA)
Jonathan Maron created YARN-2648:


 Summary: need mechanism for updating HDFS delegation tokens 
associated with container launch contexts
 Key: YARN-2648
 URL: https://issues.apache.org/jira/browse/YARN-2648
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager, resourcemanager
Reporter: Jonathan Maron


During the launch of a container, the required delegation tokens (e.g. HDFS) 
are passed to the launch context.  If those tokens expire and the container 
requires a restart the restart attempt will fail.  Sample log output:

2014-10-06 18:37:28,609 WARN  ipc.Client (Client.java:run(675)) - Exception 
encountered while connecting to the server : 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (HDFS_DELEGATION_TOKEN token 124 for hbase) can't be found in cache

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2644:
--
Attachment: YARN-2644.15.patch

Update patch against latest trunk in new(er) git repo

 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1857:
--
Attachment: YARN-1857.5.patch

Updating to current trunk on new(er) repo

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-06 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2633:

Attachment: YARN-2633.patch

Attaching the patch.
The caught exception was to be ignored. Still throwing YarnRuntimeException 
from the catch clause did not make sense. deleting the line that throws the 
exception.

 TestContainerLauncherImpl sometimes fails
 -

 Key: YARN-2633
 URL: https://issues.apache.org/jira/browse/YARN-2633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2633.patch


 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
   at java.lang.Class.getMethod(Class.java:1665)
   at 
 org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
   at 
 org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160819#comment-14160819
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673170/YARN-1857.5.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5279//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2500:
-
Attachment: YARN-2500.patch

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, 
 YARN-2500.patch, YARN-2500.patch


 This patches contains changes in ResourceManager to support labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2500) [YARN-796] Miscellaneous changes in ResourceManager to support labels

2014-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160858#comment-14160858
 ] 

Wangda Tan commented on YARN-2500:
--

Regarding comments from [~vinodkv],
bq. As with other patches, Labels - NodeLabels. You'll need to change all of 
the following:...
Addressed
bq. ApplicationMasterService: There are multiple 
this.rmContext.getRMApps().get(appAttemptId.getApplicationId() calls in the the 
allocate method. Refactor to avoid dup calls.
Addressed
bq. TestSchedulerUtils: testValidateResourceRequestWithErrorLabelsPermission: 
Why are  and   accepted when only x and y are recognized labels?
Empty label expression  should be accept by any queue, and   will be 
trimmed to empty.
bq. Given we don't support yet other features in ResourceRequest for the AM 
container like priority, locality, shall we also hard-code them to 
AM_CONTAINER_PRIORITY, ResourceRequest.ANY respectively too?
Agree, now set values to default for 
priority/#container/resource-name/relax-locality
bq. Can we add test-case for num-containers, priority, locality for AM 
container?
Added test testScheduleTransitionReplaceAMContainerRequestWithDefaults in 
RMAppAttemptImpl.

Please kindly review,

Thanks,
Wangda

 [YARN-796] Miscellaneous changes in ResourceManager to support labels
 -

 Key: YARN-2500
 URL: https://issues.apache.org/jira/browse/YARN-2500
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2500.patch, YARN-2500.patch, YARN-2500.patch, 
 YARN-2500.patch, YARN-2500.patch


 This patches contains changes in ResourceManager to support labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2629:
--
Attachment: YARN-2629.2.patch

Add one more option -create, which make client only try to create a new 
domain once this flag is set.

In addition, fix an existing problem in DS AM. AM should use the submitter UGI 
to put the entities.

Create a patch of these changes.

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.12.patch

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.2.patch

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160963#comment-14160963
 ] 

Hadoop QA commented on YARN-2629:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673184/YARN-2629.2.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5281//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5281//console

This message is automatically generated.

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160989#comment-14160989
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673189/YARN-2583.2.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.logaggregation.TestAggregatedLogDeletionService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5283//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5283//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5283//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5283//console

This message is automatically generated.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2644:
--
Attachment: YARN-2644.15.patch

Reupload to see if jenkins can apply the patch now

 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch, YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2633) TestContainerLauncherImpl sometimes fails

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161038#comment-14161038
 ] 

Hadoop QA commented on YARN-2633:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673174/YARN-2633.patch
  against trunk revision 687d83c.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5286//console

This message is automatically generated.

 TestContainerLauncherImpl sometimes fails
 -

 Key: YARN-2633
 URL: https://issues.apache.org/jira/browse/YARN-2633
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: YARN-2633.patch


 {noformat}
 org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
 java.lang.NoSuchMethodException: 
 org.apache.hadoop.yarn.api.ContainerManagementProtocol$$EnhancerByMockitoWithCGLIB$$25708415.close()
   at java.lang.Class.getMethod(Class.java:1665)
   at 
 org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.stopClient(RpcClientFactoryPBImpl.java:90)
   at 
 org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.stopProxy(HadoopYarnProtoRPC.java:54)
   at 
 org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy.mayBeCloseProxy(ContainerManagementProtocolProxy.java:79)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.kill(ContainerLauncherImpl.java:225)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.shutdownAllContainers(ContainerLauncherImpl.java:320)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.serviceStop(ContainerLauncherImpl.java:331)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncherImpl.testMyShutdown(TestContainerLauncherImpl.java:315)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161148#comment-14161148
 ] 

Hadoop QA commented on YARN-2644:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673200/YARN-2644.15.patch
  against trunk revision 3affad9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5285//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5285//console

This message is automatically generated.

 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch, YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2644) Recalculate headroom more frequently to keep it accurate

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161152#comment-14161152
 ] 

Jian He commented on YARN-2644:
---

looks good, committing 

 Recalculate headroom more frequently to keep it accurate
 

 Key: YARN-2644
 URL: https://issues.apache.org/jira/browse/YARN-2644
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Craig Welch
Assignee: Craig Welch
 Attachments: YARN-2644.11.patch, YARN-2644.14.patch, 
 YARN-2644.15.patch, YARN-2644.15.patch


 See parent (1198) for more detail - this specifically covers calculating the 
 headroom more frequently, to cover the cases where changes have occurred 
 which impact headroom but which are not reflected due to an application not 
 being updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161155#comment-14161155
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673224/YARN-2496.patch
  against trunk revision 8dc6abf.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5287//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch, YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1061) NodeManager is indefinitely waiting for nodeHeartBeat() response from ResouceManager.

2014-10-06 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161160#comment-14161160
 ] 

Wilfred Spiegelenburg commented on YARN-1061:
-

This is a dupe from YARN-2578. Writes do not time out and they should.

 NodeManager is indefinitely waiting for nodeHeartBeat() response from 
 ResouceManager.
 -

 Key: YARN-1061
 URL: https://issues.apache.org/jira/browse/YARN-1061
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.0.5-alpha
Reporter: Rohith

 It is observed that in one of the scenario, NodeManger is indefinetly waiting 
 for nodeHeartbeat response from ResouceManger where ResouceManger is in 
 hanged up state.
 NodeManager should get timeout exception instead of waiting indefinetly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161164#comment-14161164
 ] 

Jian He commented on YARN-1857:
---

could you please update the patch on top of YARN-2644 ? comments in the 
meanwhile: 
- update the code comments about the new calculation of headroom 
{code}
/** 
 * Headroom is min((userLimit, queue-max-cap) - consumed)
 */
{code}
- indentation of this line {{Resources.subtract(queueMaxCap, usedResources));}}

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Ming Ma (JIRA)
Ming Ma created YARN-2649:
-

 Summary: Flaky test TestAMRMRPCNodeUpdates
 Key: YARN-2649
 URL: https://issues.apache.org/jira/browse/YARN-2649
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma


Sometimes the test fails with the following error:

testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
  Time elapsed: 41.73 sec   FAILURE!
junit.framework.AssertionFailedError: AppAttempt state is not correct 
(timedout) expected:ALLOCATED but was:SCHEDULED
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
at 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)



When this happens, SchedulerEventType.NODE_UPDATE was processed before 
RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the test 
only waits for RMAppState.ACCEPTED before having NM sending heartbeat. This can 
be reproduced using custom AsyncDispatcher with CountDownLatch. Here is the log 
when this happens.

{noformat}
App State is : ACCEPTED
2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
 STATUS_UPDATE
2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] rmnode.RMNodeImpl 
(RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 of type STATUS_UPDATE
AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED Waiting 
for state : ALLOCATED
2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
 APP_ATTEMPT_ADDED

2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
 NODE_UPDATE

2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
 ATTEMPT_ADDED
2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing event 
for appattempt_1412569506932_0001_000
001 of type ATTEMPT_ADDED

2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED

{noformat}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated YARN-2649:
--
Attachment: YARN-2649.patch

Fix the test code to wait until RMAppAttemptImpl gets to 
RMAppAttemptState.SCHEDULED state before having the nm heartbeat.

Another way to fix it is to change MockRM.submitApp to waitForState on 
RMAppAttempt. That might address other test cases that use MockRM.submitApp.

 Flaky test TestAMRMRPCNodeUpdates
 -

 Key: YARN-2649
 URL: https://issues.apache.org/jira/browse/YARN-2649
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
 Attachments: YARN-2649.patch


 Sometimes the test fails with the following error:
 testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
   Time elapsed: 41.73 sec   FAILURE!
 junit.framework.AssertionFailedError: AppAttempt state is not correct 
 (timedout) expected:ALLOCATED but was:SCHEDULED
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)
 When this happens, SchedulerEventType.NODE_UPDATE was processed before 
 RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the 
 test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. 
 This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here 
 is the log when this happens.
 {noformat}
 App State is : ACCEPTED
 2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
 appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
  STATUS_UPDATE
 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
 rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 
 of type STATUS_UPDATE
 AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED 
 Waiting for state : ALLOCATED
 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
  APP_ATTEMPT_ADDED
 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
  NODE_UPDATE
 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
  ATTEMPT_ADDED
 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing 
 event for appattempt_1412569506932_0001_000
 001 of type ATTEMPT_ADDED
 2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
 appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2629:
--
Attachment: YARN-2629.3.patch

Upload a new patch:

1. Fix the test failure
2. Remove two lines of unnecessary code in TimelineClientImpl
3. Improve the code of publishing entities in DS AM

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161210#comment-14161210
 ] 

Jian He commented on YARN-2649:
---

[~mingma], thanks for working on this !
bq. Another way to fix it is to change MockRM.submitApp to waitForState on 
RMAppAttempt. That might address other test cases that use MockRM.submitApp.
I recently saw some other similar test failure e.g. YARN-2483.  maybe this is 
what we should do.  could you also run all tests locally, in case we don't 
introduce regression failure? thx 

 Flaky test TestAMRMRPCNodeUpdates
 -

 Key: YARN-2649
 URL: https://issues.apache.org/jira/browse/YARN-2649
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
 Attachments: YARN-2649.patch


 Sometimes the test fails with the following error:
 testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
   Time elapsed: 41.73 sec   FAILURE!
 junit.framework.AssertionFailedError: AppAttempt state is not correct 
 (timedout) expected:ALLOCATED but was:SCHEDULED
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)
 When this happens, SchedulerEventType.NODE_UPDATE was processed before 
 RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the 
 test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. 
 This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here 
 is the log when this happens.
 {noformat}
 App State is : ACCEPTED
 2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
 appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
  STATUS_UPDATE
 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
 rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 
 of type STATUS_UPDATE
 AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED 
 Waiting for state : ALLOCATED
 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
  APP_ATTEMPT_ADDED
 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
  NODE_UPDATE
 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
  ATTEMPT_ADDED
 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing 
 event for appattempt_1412569506932_0001_000
 001 of type ATTEMPT_ADDED
 2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
 appattempt_1412569506932_0001_01 State change from SUBMITTED to SCHEDULED
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2641) improve node decommission latency in RM.

2014-10-06 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2641:

Attachment: YARN-2641.000.patch

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161234#comment-14161234
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673234/YARN-2583.3.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5289//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5289//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5289//console

This message is automatically generated.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2583:

Attachment: YARN-2583.3.1.patch

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161238#comment-14161238
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673234/YARN-2583.3.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1268 javac 
compiler warnings (more than the trunk's current 1267 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5290//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5290//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5290//console

This message is automatically generated.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-06 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161241#comment-14161241
 ] 

Tsuyoshi OZAWA commented on YARN-1879:
--

For now, I have no idea to reconstruct same response after failover. Currently 
latest patch only return empty response. This is one discussion point of this 
design.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2629) Make distributed shell use the domain-based timeline ACLs

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161246#comment-14161246
 ] 

Hadoop QA commented on YARN-2629:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673235/YARN-2629.3.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5291//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5291//console

This message is automatically generated.

 Make distributed shell use the domain-based timeline ACLs
 -

 Key: YARN-2629
 URL: https://issues.apache.org/jira/browse/YARN-2629
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: YARN-2629.1.patch, YARN-2629.2.patch, YARN-2629.3.patch


 For demonstration the usage of this feature (YARN-2102), it's good to make 
 the distributed shell create the domain, and post its timeline entities into 
 this private space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2649) Flaky test TestAMRMRPCNodeUpdates

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161262#comment-14161262
 ] 

Hadoop QA commented on YARN-2649:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673231/YARN-2649.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5288//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5288//console

This message is automatically generated.

 Flaky test TestAMRMRPCNodeUpdates
 -

 Key: YARN-2649
 URL: https://issues.apache.org/jira/browse/YARN-2649
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ming Ma
 Attachments: YARN-2649.patch


 Sometimes the test fails with the following error:
 testAMRMUnusableNodes(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates)
   Time elapsed: 41.73 sec   FAILURE!
 junit.framework.AssertionFailedError: AppAttempt state is not correct 
 (timedout) expected:ALLOCATED but was:SCHEDULED
   at junit.framework.Assert.fail(Assert.java:50)
   at junit.framework.Assert.failNotEquals(Assert.java:287)
   at junit.framework.Assert.assertEquals(Assert.java:67)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:382)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates.testAMRMUnusableNodes(TestAMRMRPCNodeUpdates.java:125)
 When this happens, SchedulerEventType.NODE_UPDATE was processed before 
 RMAppAttemptEvent.ATTEMPT_ADDED was processed. That is possible, given the 
 test only waits for RMAppState.ACCEPTED before having NM sending heartbeat. 
 This can be reproduced using custom AsyncDispatcher with CountDownLatch. Here 
 is the log when this happens.
 {noformat}
 App State is : ACCEPTED
 2014-10-05 21:25:07,305 INFO  [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
 appattempt_1412569506932_0001_01 State change from NEW to SUBMITTED
 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeStatusEvent.EventType:
  STATUS_UPDATE
 2014-10-05 21:25:07,305 DEBUG [AsyncDispatcher event handler] 
 rmnode.RMNodeImpl (RMNodeImpl.java:handle(384)) - Processing 127.0.0.1:1234 
 of type STATUS_UPDATE
 AppAttempt : appattempt_1412569506932_0001_01 State is : SUBMITTED 
 Waiting for state : ALLOCATED
 2014-10-05 21:25:07,306 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.AppAttemptAddedSchedulerEvent.EventType:
  APP_ATTEMPT_ADDED
 2014-10-05 21:25:07,328 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.NodeUpdateSchedulerEvent.EventType:
  NODE_UPDATE
 2014-10-05 21:25:07,330 DEBUG [AsyncDispatcher event handler] 
 event.AsyncDispatcher (AsyncDispatcher.java:dispatch(164)) - Dispatching the 
 event 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent.EventType:
  ATTEMPT_ADDED
 2014-10-05 21:25:07,331 DEBUG [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(658)) - Processing 
 event for appattempt_1412569506932_0001_000
 001 of type ATTEMPT_ADDED
 2014-10-05 21:25:07,333 INFO  [AsyncDispatcher event handler] 
 attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(670)) - 
 

[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161278#comment-14161278
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673242/YARN-2583.3.1.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.yarn.server.nodemanager.securTests

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5294//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5294//console

This message is automatically generated.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161279#comment-14161279
 ] 

Hadoop QA commented on YARN-2583:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673242/YARN-2583.3.1.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:

org.apache.hadoop.yarn.server.nodemanager.securTests

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5295//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5295//console

This message is automatically generated.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161280#comment-14161280
 ] 

Jian He commented on YARN-1879:
---

I think we are mixing two issues in this jira:
1. Mark annotation on protocol for failover. (RM work-preserving failover won’t 
work without proper protocol annotations. RetryCache won’t help in this 
scenario, as the cache simply gets cleaned-up after failover/restart)
2. Change the API to return same response for duplicate requests.
I propose let’s do 1) first which is what really affects work-preserving RM 
failover, and do 2) separately.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol
 ---

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2583) Modify the LogDeletionService to support Log aggregation for LRS

2014-10-06 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161282#comment-14161282
 ] 

Xuan Gong commented on YARN-2583:
-

Those testcases fail because of binding exception. I do not think they are 
related.

 Modify the LogDeletionService to support Log aggregation for LRS
 

 Key: YARN-2583
 URL: https://issues.apache.org/jira/browse/YARN-2583
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Xuan Gong
Assignee: Xuan Gong
 Attachments: YARN-2583.1.patch, YARN-2583.2.patch, 
 YARN-2583.3.1.patch, YARN-2583.3.patch


 Currently, AggregatedLogDeletionService will delete old logs from HDFS. It 
 will check the cut-off-time, if all logs for this application is older than 
 this cut-off-time. The app-log-dir from HDFS will be deleted. This will not 
 work for LRS. We expect a LRS application can keep running for a long time. 
 Two different scenarios: 
 1) If we configured the rollingIntervalSeconds, the new log file will be 
 always uploaded to HDFS. The number of log files for this application will 
 become larger and larger. And there is no log files will be deleted.
 2) If we did not configure the rollingIntervalSeconds, the log file can only 
 be uploaded to HDFS after the application is finished. It is very possible 
 that the logs are uploaded after the cut-off-time. It will cause problem 
 because at that time the app-log-dir for this application in HDFS has been 
 deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1879:
--
Summary: Mark Idempotent/AtMostOnce annotations to 
ApplicationMasterProtocol for RM fail over  (was: Mark Idempotent/AtMostOnce 
annotations to ApplicationMasterProtocol)

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161300#comment-14161300
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5292//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5292//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2641) improve node decommission latency in RM.

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161299#comment-14161299
 ] 

Hadoop QA commented on YARN-2641:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673239/YARN-2641.000.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5293//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5293//console

This message is automatically generated.

 improve node decommission latency in RM.
 

 Key: YARN-2641
 URL: https://issues.apache.org/jira/browse/YARN-2641
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.5.0
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2641.000.patch


 improve node decommission latency in RM. 
 Currently the node decommission only happened after RM received nodeHeartbeat 
 from the Node Manager. The node heartbeat interval is configurable. The 
 default value is 1 second.
 It will be better to do the decommission during RM Refresh(NodesListManager) 
 instead of nodeHeartbeat(ResourceTrackerService).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-06 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161353#comment-14161353
 ] 

Sanjay Radia commented on YARN-913:
---

Some feedback:
# rename {{RegistryOperations.create()}} to {{bind()}}
# rename {{org/apache/hadoop/yarn/registry/client/services}} to
{{org/apache/hadoop/yarn/registry/client/impl}}
# move all ZK classes under
{{org/apache/hadoop/yarn/registry/client/impl/zk}}, i.e. the current
implementations of the registry client
# {{RegistryOperations}} implementations to remove declaration of
exceptions other than IOE.
# {{RegistryOperations.resolve()} implementation should not mention
record headers in exception text: that's an implementation detail
# Add README under {{org.apache.hadoop.yarn.registry.server}} to
emphasize this is server-side code
# Allow {{ServiceRecord}} to support arbitrary key-values
# remove {{yarn_id}}  {{yarn_persistence}} ffields rom
{{ServiceRecord}}, moving them to the set of arbitrary key-values This
ensures that there isn't explicit hard-coding of the assumption these
are YARN apps from the records.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161368#comment-14161368
 ] 

Karthik Kambatla commented on YARN-1879:


Thanks for looking at it closely, Jian and Xuan. I have missed some of these 
points. Spent a little more time thinking through.

bq. I think we are mixing two issues in this jira
When we mark an API Idempotent or AtMostOnce, the retry-policies will end up 
re-invoking the API on the other RM in case of a failover. This is okay only if 
the RM handles these duplicate requests. Further, my understanding is that the 
behavior of Idempotent APIs should be the same on each invocation; i.e., the 
client should receive the exact same response too. 

If we handle duplicate requests but return a different response to the client 
on duplicate calls, we can mark it AtMostOnce. If we return the same response, 
we can go ahead and mark it Idempotent. Needless to say, the RM should 
definitely duplicate requests gracefully. Does that sound reasonable? 

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161373#comment-14161373
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5296//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5296//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue

2014-10-06 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161395#comment-14161395
 ] 

Sunil G commented on YARN-2647:
---

Hi [~gp.leftnoteasy], I would like to take this up. 
Thank you.

 [YARN-796] Add yarn queue CLI to get queue info including labels of such queue
 --

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2647) [YARN-796] Add yarn queue CLI to get queue info including labels of such queue

2014-10-06 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G reassigned YARN-2647:
-

Assignee: Sunil G

 [YARN-796] Add yarn queue CLI to get queue info including labels of such queue
 --

 Key: YARN-2647
 URL: https://issues.apache.org/jira/browse/YARN-2647
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client
Reporter: Wangda Tan
Assignee: Sunil G





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1879) Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM fail over

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161396#comment-14161396
 ] 

Jian He commented on YARN-1879:
---

bq. This is okay only if the RM handles these duplicate requests.
In RM failover, even if the request  from client’s perspective is duplicate; 
from RM’s perspective, these are just new requests, as the new RM doesn’t have 
any cache for previous requests from client.  Just to unblock this, I suggest 
marking the annotation now so that the operation can be retried in failover.  
And discuss the internal implementation separately.

 Mark Idempotent/AtMostOnce annotations to ApplicationMasterProtocol for RM 
 fail over
 

 Key: YARN-1879
 URL: https://issues.apache.org/jira/browse/YARN-1879
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Tsuyoshi OZAWA
Priority: Critical
 Attachments: YARN-1879.1.patch, YARN-1879.1.patch, 
 YARN-1879.11.patch, YARN-1879.12.patch, YARN-1879.13.patch, 
 YARN-1879.14.patch, YARN-1879.15.patch, YARN-1879.16.patch, 
 YARN-1879.17.patch, YARN-1879.18.patch, YARN-1879.19.patch, 
 YARN-1879.2-wip.patch, YARN-1879.2.patch, YARN-1879.20.patch, 
 YARN-1879.21.patch, YARN-1879.22.patch, YARN-1879.3.patch, YARN-1879.4.patch, 
 YARN-1879.5.patch, YARN-1879.6.patch, YARN-1879.7.patch, YARN-1879.8.patch, 
 YARN-1879.9.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.13.patch

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161403#comment-14161403
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12673284/YARN-796.node-label.consolidate.13.patch
  against trunk revision 519e5a7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5298//console

This message is automatically generated.

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161402#comment-14161402
 ] 

Hadoop QA commented on YARN-2496:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673283/YARN-2496.patch
  against trunk revision 519e5a7.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5297//console

This message is automatically generated.

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.13.patch

Updated to trunk

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: (was: YARN-796.node-label.consolidate.13.patch)

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: YARN-796.node-label.consolidate.13.patch

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.13.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-796:

Attachment: (was: YARN-796.node-label.consolidate.13.patch)

 Allow for (admin) labels on nodes and resource-requests
 ---

 Key: YARN-796
 URL: https://issues.apache.org/jira/browse/YARN-796
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.4.1
Reporter: Arun C Murthy
Assignee: Wangda Tan
 Attachments: LabelBasedScheduling.pdf, 
 Node-labels-Requirements-Design-doc-V1.pdf, 
 Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
 YARN-796.node-label.consolidate.1.patch, 
 YARN-796.node-label.consolidate.10.patch, 
 YARN-796.node-label.consolidate.11.patch, 
 YARN-796.node-label.consolidate.12.patch, 
 YARN-796.node-label.consolidate.2.patch, 
 YARN-796.node-label.consolidate.3.patch, 
 YARN-796.node-label.consolidate.4.patch, 
 YARN-796.node-label.consolidate.5.patch, 
 YARN-796.node-label.consolidate.6.patch, 
 YARN-796.node-label.consolidate.7.patch, 
 YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
 YARN-796.patch, YARN-796.patch4


 It will be useful for admins to specify labels for nodes. Examples of labels 
 are OS, processor architecture etc.
 We should expose these labels and allow applications to specify labels on 
 resource-requests.
 Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-913) Add a way to register long-lived services in a YARN cluster

2014-10-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161465#comment-14161465
 ] 

Steve Loughran commented on YARN-913:
-

Sanjay, I can do most of these, 

* w.r.t. the README, we have a javadoc {{package-info.java}}, that's enough.
* I propose restricting the custom values that a service record to have to 
string attributes. support arbitrary JSON opens things up to people embedding 
entire custom JSON docs in there, which could kill the notion of having 
semi-standardised records that other apps can work with + published API 
endpoints for any extra stuff *outside the registry*
* I'm going to rename the yarn fields back to {{yarn:id}} and 
{{yarn:persistence}} if Jersey+jackson marshalls them reliably once they aren't 
introspection-driven. It makes the yarn-nature of them clearer.

 Add a way to register long-lived services in a YARN cluster
 ---

 Key: YARN-913
 URL: https://issues.apache.org/jira/browse/YARN-913
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: api, resourcemanager
Affects Versions: 2.5.0, 2.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: 2014-09-03_Proposed_YARN_Service_Registry.pdf, 
 2014-09-08_YARN_Service_Registry.pdf, RegistrationServiceDetails.txt, 
 YARN-913-001.patch, YARN-913-002.patch, YARN-913-003.patch, 
 YARN-913-003.patch, YARN-913-004.patch, YARN-913-006.patch, 
 YARN-913-007.patch, YARN-913-008.patch, YARN-913-009.patch, 
 YARN-913-010.patch, YARN-913-011.patch, YARN-913-012.patch, 
 YARN-913-013.patch, YARN-913-014.patch, YARN-913-015.patch, 
 YARN-913-016.patch, YARN-913-017.patch, YARN-913-018.patch, yarnregistry.pdf, 
 yarnregistry.pdf, yarnregistry.tla


 In a YARN cluster you can't predict where services will come up -or on what 
 ports. The services need to work those things out as they come up and then 
 publish them somewhere.
 Applications need to be able to find the service instance they are to bond to 
 -and not any others in the cluster.
 Some kind of service registry -in the RM, in ZK, could do this. If the RM 
 held the write access to the ZK nodes, it would be more secure than having 
 apps register with ZK themselves.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-10-06 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161487#comment-14161487
 ] 

Wangda Tan commented on YARN-2056:
--

Hi [~eepayne],
Sorry to response so late,

I've carefully read and thought about what you mentioned, especially the 
algorithm. I think it can resolve most issues, but it cannot guarante all case 
will be resolved. I think in your algorithm, following case will be not correct.

{code}
total = 100
qA: used = 10, guaranteed = 10, pending = 100
qB: used = 25, guaranteed = 10, pending = 100 (non-preemptable)
qC: used = 0, guaranteed = 80, pending = 0

1. In the start, unassign = 100, It will first exclude qB, unassigned = 80
2. It will try to fill qA, qA = qA + 80 * (10/(10 + 80) = 18, unassigned = 72. 
qC will be removed this turn
3. qB will not be added back here, becaused ideal_assign(qA) = 18, 
ideal_assign(qB) = 25.
4. All resource will be used by qA. The result should be
   ideal(qA) = 75, ideal(qB) = 25
{code}

And in addition, the remove-then-add-back algorithm seems not very 
straight-forward to me.

In my mind, this problem is like fulfilling water to a water tank like 
following, some of the tank has stones, make some of them higher than others. 
Because of water flows, so the result is most equilibrized (which water surface 
has same height, and some stone can be higher than water surface).
{code}
   _
  | |   
__|  X  |  
|X  |__   
|X|
|  X X   X|
|X X X   X|
---
 1 2 3 4 5
{code}

The algorithm may look like,
{code}
At the beginning, all queue will set ideal_assign = non-preemptable resource, 
and deduct non-preemptable resource from total-remained resource (stones is 
here). All queue will keep in qAlloc

In each turn,
- All queues not completely satisfied ideal_guaranteed = min(maximum_capacity, 
used + pending) will NOT be removed. Like what we have today. (water hasn't 
reached ceiling of the tank)
- Get normalized weight of each queue
- Get the queue with the minimum {ideal_assigned % guarantee}, say Q_min
- Get the target_height is = (Q_min.ideal_assigned + remained * 
Q_min.normalized_gurantee)
- For each queue, do TempQueue.offer like today
- The TempQueue.offer method looks like
  * If (q.ideal_assigned  target_height): skip
  * If (q.ideal_assigned = target_height): accepted = min(q.maximum, q.used + 
q.pending, target_height * q.guaranteed) - q.ideal_assigned
- If accepted becomes zero, remove the queue from qAlloc like today. 

The loop will exit until total-remained become zero (resource are exhausted) or 
qAlloc becomes empty (all queue get satisfied). 
{code}

I think this algorithm can get a more balanced result. Does this make sense to 
you?

Thanks,
Wangda

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
 YARN-2056.201409232329.txt, YARN-2056.201409242210.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels

2014-10-06 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2496:
-
Attachment: YARN-2496.patch

Attached new patch against latest trunk

 [YARN-796] Changes for capacity scheduler to support allocate resource 
 respect labels
 -

 Key: YARN-2496
 URL: https://issues.apache.org/jira/browse/YARN-2496
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, 
 YARN-2496.patch


 This JIRA Includes:
 - Add/parse labels option to {{capacity-scheduler.xml}} similar to other 
 options of queue like capacity/maximum-capacity, etc.
 - Include a default-label-expression option in queue config, if an app 
 doesn't specify label-expression, default-label-expression of queue will be 
 used.
 - Check if labels can be accessed by the queue when submit an app with 
 labels-expression to queue or update ResourceRequest with label-expression
 - Check labels on NM when trying to allocate ResourceRequest on the NM with 
 label-expression
 - Respect  labels when calculate headroom/user-limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161496#comment-14161496
 ] 

Jian He commented on YARN-1857:
---

I found given that  {{queueUsedResources = userConsumed}}, we can simplify the 
formula to {code} min (userlimit - userConsumed,   queueMaxCap- 
queueUsedResources) {code}, does this make sense ?

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >