date:20140728

[jira] [Commented] (YARN-611) Add an AM retry count reset window to YARN RM

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077443#comment-14077443
 ] 

Hadoop QA commented on YARN-611:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658363/YARN-611.4.rebase.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4466//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4466//console

This message is automatically generated.

> Add an AM retry count reset window to YARN RM
> -
>
> Key: YARN-611
> URL: https://issues.apache.org/jira/browse/YARN-611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Chris Riccomini
>Assignee: Xuan Gong
> Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, 
> YARN-611.4.patch, YARN-611.4.rebase.patch
>
>
> YARN currently has the following config:
> yarn.resourcemanager.am.max-retries
> This config defaults to 2, and defines how many times to retry a "failed" AM 
> before failing the whole YARN job. YARN counts an AM as failed if the node 
> that it was running on dies (the NM will timeout, which counts as a failure 
> for the AM), or if the AM dies.
> This configuration is insufficient for long running (or infinitely running) 
> YARN jobs, since the machine (or NM) that the AM is running on will 
> eventually need to be restarted (or the machine/NM will fail). In such an 
> event, the AM has not done anything wrong, but this is counted as a "failure" 
> by the RM. Since the retry count for the AM is never reset, eventually, at 
> some point, the number of machine/NM failures will result in the AM failure 
> count going above the configured value for 
> yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the 
> job as failed, and shut it down. This behavior is not ideal.
> I propose that we add a second configuration:
> yarn.resourcemanager.am.retry-count-window-ms
> This configuration would define a window of time that would define when an AM 
> is "well behaved", and it's safe to reset its failure count back to zero. 
> Every time an AM fails the RmAppImpl would check the last time that the AM 
> failed. If the last failure was less than retry-count-window-ms ago, and the 
> new failure count is > max-retries, then the job should fail. If the AM has 
> never failed, the retry count is < max-retries, or if the last failure was 
> OUTSIDE the retry-count-window-ms, then the job should be restarted. 
> Additionally, if the last failure was outside the retry-count-window-ms, then 
> the failure count should be set back to 0.
> This would give developers a way to have well-behaved AMs run forever, while 
> still failing mis-behaving AMs after a short period of time.
> I think the work to be done here is to change the RmAppImpl to actually look 
> at app.attempts, and see if there have been more than max-retries failures in 
> the last retry-count-window-ms milliseconds. If there have, then the job 
> should fail, if not, then the job should go forward. Additionally, we might 
> also need to add an endTime in either RMAppAttemptImpl or 
> RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the 
> failure.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077440#comment-14077440
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658367/YARN-796.patch.1
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4468//console

This message is automatically generated.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch, YARN-796.patch.1
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077435#comment-14077435
 ] 

Hadoop QA commented on YARN-796:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658367/YARN-796.patch.1
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4467//console

This message is automatically generated.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch, YARN-796.patch.1
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-28 Thread Yuliya Feldman (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuliya Feldman updated YARN-796:


Labels:   (was: patch)

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch, YARN-796.patch.1
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-07-28 Thread Yuliya Feldman (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuliya Feldman updated YARN-796:


Attachment: YARN-796.patch.1

First patch based on "LabelBasedScheduling" design document

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch, YARN-796.patch.1
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077416#comment-14077416
 ] 

Xuan Gong commented on YARN-1994:
-

I think it is because connectAddress is needed for generating the nodeId. With 
this patch, we will bind the NM Server with the NM_BIND address. We need the 
real nm_address to generate the nodeId.
[~cwelch] Could you confirm whether it is the reason ?

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, 
> YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-611) Add an AM retry count reset window to YARN RM

2014-07-28 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-611:
---

Attachment: YARN-611.4.rebase.patch

rebased on the latest trunk

> Add an AM retry count reset window to YARN RM
> -
>
> Key: YARN-611
> URL: https://issues.apache.org/jira/browse/YARN-611
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.3-alpha
>Reporter: Chris Riccomini
>Assignee: Xuan Gong
> Attachments: YARN-611.1.patch, YARN-611.2.patch, YARN-611.3.patch, 
> YARN-611.4.patch, YARN-611.4.rebase.patch
>
>
> YARN currently has the following config:
> yarn.resourcemanager.am.max-retries
> This config defaults to 2, and defines how many times to retry a "failed" AM 
> before failing the whole YARN job. YARN counts an AM as failed if the node 
> that it was running on dies (the NM will timeout, which counts as a failure 
> for the AM), or if the AM dies.
> This configuration is insufficient for long running (or infinitely running) 
> YARN jobs, since the machine (or NM) that the AM is running on will 
> eventually need to be restarted (or the machine/NM will fail). In such an 
> event, the AM has not done anything wrong, but this is counted as a "failure" 
> by the RM. Since the retry count for the AM is never reset, eventually, at 
> some point, the number of machine/NM failures will result in the AM failure 
> count going above the configured value for 
> yarn.resourcemanager.am.max-retries. Once this happens, the RM will mark the 
> job as failed, and shut it down. This behavior is not ideal.
> I propose that we add a second configuration:
> yarn.resourcemanager.am.retry-count-window-ms
> This configuration would define a window of time that would define when an AM 
> is "well behaved", and it's safe to reset its failure count back to zero. 
> Every time an AM fails the RmAppImpl would check the last time that the AM 
> failed. If the last failure was less than retry-count-window-ms ago, and the 
> new failure count is > max-retries, then the job should fail. If the AM has 
> never failed, the retry count is < max-retries, or if the last failure was 
> OUTSIDE the retry-count-window-ms, then the job should be restarted. 
> Additionally, if the last failure was outside the retry-count-window-ms, then 
> the failure count should be set back to 0.
> This would give developers a way to have well-behaved AMs run forever, while 
> still failing mis-behaving AMs after a short period of time.
> I think the work to be done here is to change the RmAppImpl to actually look 
> at app.attempts, and see if there have been more than max-retries failures in 
> the last retry-count-window-ms milliseconds. If there have, then the job 
> should fail, if not, then the job should go forward. Additionally, we might 
> also need to add an endTime in either RMAppAttemptImpl or 
> RMAppFailedAttemptEvent, so that the RmAppImpl can check the time of the 
> failure.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077401#comment-14077401
 ] 

Arpit Agarwal commented on YARN-1994:
-

+1 from me module one question. Why is the following logic only needed for 
ContainerManagerImpl.java? I probably knew this but can't recall now.

{code}

InetSocketAddress connectAddress;
String connectHost = conf.getTrimmed(YarnConfiguration.NM_ADDRESS);
if (connectHost == null || connectHost.isEmpty()) {
  // Get hostname and port from the listening endpoint.
  connectAddress = NetUtils.getConnectAddress(server);
} else {
  // Combine the configured hostname with the port from the listening
  // endpoint. This gets the correct port number if the configuration
  // specifies an ephemeral port (port number 0).
  connectAddress = NetUtils.getConnectAddress(
  new InetSocketAddress(connectHost.split(":")[0],
server.getListenerAddress().getPort()));
}
{code}

Thanks.

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, 
> YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077319#comment-14077319
 ] 

Hadoop QA commented on YARN-1979:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658331/YARN-1979.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4465//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4465//console

This message is automatically generated.

> TestDirectoryCollection fails when the umask is unusual
> ---
>
> Key: YARN-1979
> URL: https://issues.apache.org/jira/browse/YARN-1979
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1979.2.patch, YARN-1979.txt
>
>
> I've seen this fail in Windows where the default permissions are matching up 
> to 700.
> {code}
> ---
> Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> ---
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.422 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
>  not created with proper permissions expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
> {code}
> The clash is between testDiskSpaceUtilizationLimit() and 
> testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2215) Add preemption info to REST/CLI

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077303#comment-14077303
 ] 

Wangda Tan commented on YARN-2215:
--

Hi [~kj-ki],
Thanks for working on this, I've assigned this JIRA to you. 
I think the fields you added should be fine. With the scope of this JIRA, I 
think it's better to add CLI support as well. Please submit patch to kickoff 
jenkins when you completed.

Wangda


> Add preemption info to REST/CLI
> ---
>
> Key: YARN-2215
> URL: https://issues.apache.org/jira/browse/YARN-2215
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Kenji Kikushima
> Attachments: YARN-2215.patch
>
>
> As discussed in YARN-2181, we'd better to add preemption info to RM RESTful 
> API/CLI to make administrator/user get more understanding about preemption 
> happened on app/queue, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2215) Add preemption info to REST/CLI

2014-07-28 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2215:
-

Assignee: Kenji Kikushima

> Add preemption info to REST/CLI
> ---
>
> Key: YARN-2215
> URL: https://issues.apache.org/jira/browse/YARN-2215
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Reporter: Wangda Tan
>Assignee: Kenji Kikushima
> Attachments: YARN-2215.patch
>
>
> As discussed in YARN-2181, we'd better to add preemption info to RM RESTful 
> API/CLI to make administrator/user get more understanding about preemption 
> happened on app/queue, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1979) TestDirectoryCollection fails when the umask is unusual

2014-07-28 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1979:
-

Attachment: YARN-1979.2.patch

This JIRA seems to be forgotten, so let me update the patch. Just removed the 
lines [~djp] mentioned.

> TestDirectoryCollection fails when the umask is unusual
> ---
>
> Key: YARN-1979
> URL: https://issues.apache.org/jira/browse/YARN-1979
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1979.2.patch, YARN-1979.txt
>
>
> I've seen this fail in Windows where the default permissions are matching up 
> to 700.
> {code}
> ---
> Test set: org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> ---
> Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.015 sec <<< 
> FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection
> testCreateDirectories(org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection)
>   Time elapsed: 0.422 sec  <<< FAILURE!
> java.lang.AssertionError: local dir parent 
> Y:\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection\dirA
>  not created with proper permissions expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.TestDirectoryCollection.testCreateDirectories(TestDirectoryCollection.java:106)
> {code}
> The clash is between testDiskSpaceUtilizationLimit() and 
> testCreateDirectories().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077296#comment-14077296
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~curino],
Thanks for your reply,
For regarding how the patch matches the JIRA:
Since I don't have other solid use cases in my mind that others besides 
{{ReservationSystem}} can leverage these features, I don't have strong opinions 
to merge such dynamic behaviors into {{ParentQueue}}, {{LeafQueue}}. Let's wait 
for more feedbacks.
I agree that we can consider queue capacity as a "weight", it will be easier 
for users to configure, and it's a backward-compatible change also (except it 
will not throw exception when sum of children of a {{ParentQueue}} doesn't 
equals to 100).

bq. As I was mentioning in my previous comment, this is likely fine for the 
limited usage we will make of this from ReservationSystem
I think for moving application across queue is not a ReservationSystem specific 
change. I would suggest to check it will not violate restrictions in target 
queue before moving it.

Thanks,
Wangda

> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (YARN-1826) TestDirectoryCollection intermittent failures

2014-07-28 Thread Tsuyoshi OZAWA (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA resolved YARN-1826.
--

Resolution: Duplicate

> TestDirectoryCollection intermittent failures
> -
>
> Key: YARN-1826
> URL: https://issues.apache.org/jira/browse/YARN-1826
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
>
> testCreateDirectories fails intermittently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077285#comment-14077285
 ] 

Xuan Gong commented on YARN-1994:
-

+1 LGTM

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, 
> YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1826) TestDirectoryCollection intermittent failures

2014-07-28 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077282#comment-14077282
 ] 

Tsuyoshi OZAWA commented on YARN-1826:
--

Thank you for commenting, Wangda. Vinod is fixing this problem on YARN-1979. 
Close this as duplicated.

> TestDirectoryCollection intermittent failures
> -
>
> Key: YARN-1826
> URL: https://issues.apache.org/jira/browse/YARN-1826
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Tsuyoshi OZAWA
>
> testCreateDirectories fails intermittently.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2367) Make ResourceCalculator configurable for FairScheduler and FifoScheduler like CapacityScheduler

2014-07-28 Thread Swapnil Daingade (JIRA)

Swapnil Daingade created YARN-2367:
--

 Summary: Make ResourceCalculator configurable for FairScheduler 
and FifoScheduler like CapacityScheduler
 Key: YARN-2367
 URL: https://issues.apache.org/jira/browse/YARN-2367
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.1, 2.3.0, 2.2.0
Reporter: Swapnil Daingade
Priority: Minor


The ResourceCalculator used by CapacityScheduler is read from a configuration 
file entry capacity-scheduler.xml yarn.scheduler.capacity.resource-calculator. 
This allows for custom implementations that implement the ResourceCalculator 
interface to be plugged in. It would be nice to have the same functionality in 
FairScheduler and FifoScheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077279#comment-14077279
 ] 

Hadoop QA commented on YARN-2354:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658299/YARN-2354-072814.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell:

  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4464//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4464//console

This message is automatically generated.

> DistributedShell may allocate more containers than client specified after it 
> restarts
> -
>
> Key: YARN-2354
> URL: https://issues.apache.org/jira/browse/YARN-2354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Li Lu
> Attachments: YARN-2354-072514.patch, YARN-2354-072814.patch
>
>
> To reproduce, run distributed shell with -num_containers option,
> In ApplicationMaster.java, the following code has some issue.
> {code}
>   int numTotalContainersToRequest =
> numTotalContainers - previousAMRunningContainers.size();
> for (int i = 0; i < numTotalContainersToRequest; ++i) {
>   ContainerRequest containerAsk = setupContainerAskForRM();
>   amRMClient.addContainerRequest(containerAsk);
> }
> numRequestedContainers.set(numTotalContainersToRequest);
> {code}
>  numRequestedContainers doesn't account for previous AM's requested 
> containers. so numRequestedContainers should be set to numTotalContainers



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077262#comment-14077262
 ] 

Wangda Tan commented on YARN-415:
-

Hi [~eepayne],
Thanks for updating your patch,
For e2e test, I think we can do this way, you can refer to tests in 
TestRMRestart
Using MockRM/MockAM can do such test, even though it's not a complete e2e test, 
but most logic are included in it. I suggest we could cover following cases:
{code}
* Create an app, before submit AM, resource utilization should be 0
* Submit AM, while AM running, we can get its resource utilization > 0
* Allocate some container, and finish them, check total resource utilization
* Finish application attempt, and check total resource utilization
* Start a new application attempt, check if resource utilization of previous 
attempt is added to total resource utilization.
* Check if resource utilization can be persist/read during RM restart
{code}
Do you have any comments on this?

Thanks,
Wangda

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Carlo Curino (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077256#comment-14077256
 ] 

Carlo Curino commented on YARN-1707:


Thanks again for the fast and insightful feedback. 

*Regarding how the patch matches the JIRA:*
Our initial implementation was indeed making the changes (i.e., the dynamic 
behaviors) in ParentQueue and LeafQueue themselves. Previous feedback pushed us 
to have subclasses to in a sense isolate the changes to dynamic subclasses. I 
think we can go back to the version modifying directly ParentQueue and 
LeafQueue if there is consensus. #4 is required because we cannot 
transactionally “add Q1, resize Q2” so that the invariant “size of children is 
== 100%” is maintained. As a consequence we must relax the constraints (either 
in ParentQueue if we remove the hierarchy, or as it is today in PlanQueue).  
The good news is that the percentages from the configuration are not 
interpreted as actual percentages, but rather used as relative "weights" 
(ranking queues in used_resources / guaranteed_resources). This means that even 
a careless admin will not get resources unused.  For example, if we set two 
queues to 10,40 (i.e., something that doesn't add up to 100), the behavior is 
equivalent to setting them to 20,80 (as they are used only for relative ranking 
of siblings). I think this is also ok for hierarchies (worth double checking 
this part).

So all in all we can pull up to {{ParentQueue}} and {{LeafQueue}} all the 
dynamic behavior if there is consensus that this is the right path.

*Regarding move:*
1) Good catch... We will wait for feedback from Jian on this.
2) I think we had that at some point and did not work correctly. We will try 
again.
3) There are few invariants we do not check. {{MaxApplicationsPerUser}} is one 
of them, but also how many applications can be active in the target queue, 
etc... As I was mentioning in my previous comment, this is likely fine for the 
limited usage we will make of this from {{ReservationSystem}}, but it is worth 
expand the checks we make (see 
{{FairScheduler.verifyMoveDoesNotViolateConstraints(..)}}) to expose move to 
users via CLI.


> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077254#comment-14077254
 ] 

Wangda Tan commented on YARN-1707:
--

Hi [~subru], 
Thanks for your elaboration, it is very helpful for me to understand the 
background.

Regards,
Wangda


> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Subramaniam Venkatraman Krishnan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077242#comment-14077242
 ] 

Subramaniam Venkatraman Krishnan commented on YARN-1707:


[~wangda] Thanks for the very detailed comments. I agree that understanding the 
context is essential & glad to help with that. Overall your understanding is 
spot on, please find answers to your questions below: 

1) Yes, it is possible to have multiple PlanQueues (e.g., if two organization 
want to dynamically allocate their resources, but not share among them). This 
is also good to "try" reservation on a small scale and slowly ramp up at each 
org's pace.
2) The extra confs are needed to automate the initialization of key parameters 
of the dynamic ReservationQueues (without requiring full specification of each 
of those).
3) Correct
4) Correct
5) First: the Plan guarantees that the sum of reservations never exceed 
available resources (replanning if needed to maintain this invariant to handle 
failures). On the other hand, like it happens for normal scheduler we can 
leverage "overcapacity" to guarantee high cluster utilization. More precisely, 
depending on the configuration (or dynamically on whether reservations have 
gang semantics or not) we can allow resources allocated to PlanQueue and 
ReservationQueue to exceed their guaranteed capacity (i.e., set the dynamic 
max-capacity above the guaranteed one). In this case preemption might kick in 
if other apps with more rights on resources have pending askss. Part of the 
changes in YARN-1957 were driven by this.
6) To limit the scope of changed, we agreed to have a follow up JIRA to address 
HA. The intuition we have is that it is sufficient to persist the Plan alone. 
During recovery, the _Plan Follower_ will resync the Plan with the scheduler by 
creating the dynamic queues for currently active reservations. We will be happy 
to have your input when we work on the HA JIRA.

[~curino] will answer your questions specify to this JIRA.

> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077235#comment-14077235
 ] 

Junping Du commented on YARN-2209:
--

bq. Previously, AM doesn't do re-register. Re-register on RM restart is a new 
requirement coming out from YARN-556.
Does RESYNC being added in YARN-556 also? If so, I think this is a reasonable 
change and I suggest to remove RESYNC completely (not just deprecated) before 
this feature get released. 

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2026) Fair scheduler : Fair share for inactive queues causes unfair allocation in some scenarios

2014-07-28 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077218#comment-14077218
 ] 

Ashwin Shankar commented on YARN-2026:
--

[~kasha],[~sandyr] , did you have any comments on the latest patch ?
I also made UI changes and attached screenshot which shows static/dynamic fair 
share in YARN-2360.
Can you please take a look at that also ?

> Fair scheduler : Fair share for inactive queues causes unfair allocation in 
> some scenarios
> --
>
> Key: YARN-2026
> URL: https://issues.apache.org/jira/browse/YARN-2026
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: scheduler
> Attachments: YARN-2026-v1.txt, YARN-2026-v2.txt, YARN-2026-v3.txt
>
>
> Problem1- While using hierarchical queues in fair scheduler,there are few 
> scenarios where we have seen a leaf queue with least fair share can take 
> majority of the cluster and starve a sibling parent queue which has greater 
> weight/fair share and preemption doesn’t kick in to reclaim resources.
> The root cause seems to be that fair share of a parent queue is distributed 
> to all its children irrespective of whether its an active or an inactive(no 
> apps running) queue. Preemption based on fair share kicks in only if the 
> usage of a queue is less than 50% of its fair share and if it has demands 
> greater than that. When there are many queues under a parent queue(with high 
> fair share),the child queue’s fair share becomes really low. As a result when 
> only few of these child queues have apps running,they reach their *tiny* fair 
> share quickly and preemption doesn’t happen even if other leaf 
> queues(non-sibling) are hogging the cluster.
> This can be solved by dividing fair share of parent queue only to active 
> child queues.
> Here is an example describing the problem and proposed solution:
> root.lowPriorityQueue is a leaf queue with weight 2
> root.HighPriorityQueue is parent queue with weight 8
> root.HighPriorityQueue has 10 child leaf queues : 
> root.HighPriorityQueue.childQ(1..10)
> Above config,results in root.HighPriorityQueue having 80% fair share
> and each of its ten child queue would have 8% fair share. Preemption would 
> happen only if the child queue is <4% (0.5*8=4). 
> Lets say at the moment no apps are running in any of the 
> root.HighPriorityQueue.childQ(1..10) and few apps are running in 
> root.lowPriorityQueue which is taking up 95% of the cluster.
> Up till this point,the behavior of FS is correct.
> Now,lets say root.HighPriorityQueue.childQ1 got a big job which requires 30% 
> of the cluster. It would get only the available 5% in the cluster and 
> preemption wouldn't kick in since its above 4%(half fair share).This is bad 
> considering childQ1 is under a highPriority parent queue which has *80% fair 
> share*.
> Until root.lowPriorityQueue starts relinquishing containers,we would see the 
> following allocation on the scheduler page:
> *root.lowPriorityQueue = 95%*
> *root.HighPriorityQueue.childQ1=5%*
> This can be solved by distributing a parent’s fair share only to active 
> queues.
> So in the example above,since childQ1 is the only active queue
> under root.HighPriorityQueue, it would get all its parent’s fair share i.e. 
> 80%.
> This would cause preemption to reclaim the 30% needed by childQ1 from 
> root.lowPriorityQueue after fairSharePreemptionTimeout seconds.
> Problem2 - Also note that similar situation can happen between 
> root.HighPriorityQueue.childQ1 and root.HighPriorityQueue.childQ2,if childQ2 
> hogs the cluster. childQ2 can take up 95% cluster and childQ1 would be stuck 
> at 5%,until childQ2 starts relinquishing containers. We would like each of 
> childQ1 and childQ2 to get half of root.HighPriorityQueue  fair share ie 
> 40%,which would ensure childQ1 gets upto 40% resource if needed through 
> preemption.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-07-28 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077211#comment-14077211
 ] 

Ashwin Shankar commented on YARN-2360:
--

Expected -1 from Jenkins since patch depends on unresolved YARN-2026.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077204#comment-14077204
 ] 

Jian He commented on YARN-2209:
---

bq. The customized AM code could get RESYNC from response previously (like what 
we original do in AMRMClient) to handle AM re-registering case.
Previously, AM doesn't do re-register. Re-register on RM restart is a new 
requirement coming out from YARN-556

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2347) Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in yarn-server-common

2014-07-28 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077198#comment-14077198
 ] 

Junping Du commented on YARN-2347:
--

[~zjshen], can you help to review it again? Thx!

> Consolidate RMStateVersion and NMDBSchemaVersion into StateVersion in 
> yarn-server-common
> 
>
> Key: YARN-2347
> URL: https://issues.apache.org/jira/browse/YARN-2347
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2347-v2.patch, YARN-2347-v3.patch, 
> YARN-2347-v4.patch, YARN-2347-v5.patch, YARN-2347.patch
>
>
> We have similar things for version state for RM, NM, TS (TimelineServer), 
> etc. I think we should consolidate them into a common object.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077192#comment-14077192
 ] 

Junping Du commented on YARN-2209:
--

bq. I think users are expected to handle two types of exceptions YarnException 
and IOException. In that sense, this is equivalent to throwing a new type of 
exception which should be fine?
No. The customized AM code could get RESYNC from response previously (like what 
we original do in AMRMClient) to handle AM re-registering case. Now, it cannot 
get this RESYNC so could failed to re-registering to restarted RM. Do I miss 
anything here?

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart

2014-07-28 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077178#comment-14077178
 ] 

Jason Lowe commented on YARN-1354:
--

Thanks for taking a look, Junping!

bq. what would happen if storeApplication(), finishApplication(), 
removeApplication() failed with application related information get 
inconsistent after restart?

If storeApplication fails then it will throw an IOException which will bubble 
up and fail the container start request on the client.  As long as we're unable 
to store a new application, containers for that application will not start, 
which I believe is the desired behavior.  That prevents the state store from 
being inconsistent in this particular scenario.

If finishApplication fails then the NM will proceed as if it did succeed but 
the state store will still have the application present.  This should be 
corrected when the NM restarts and registers with the RM with those 
applications still running.  The RM should correct the situation by telling the 
NM that the application has finished (see YARN-1885), and the NM will proceed 
to perform application finish processing (e.g.: log aggregation, etc.).  I 
think worst-case it will upload all of the app container logs again, but when 
it goes to rename to the final destination name that will fail because the name 
already exists.  Thus there could be some wasted work, but it should sort 
itself out and not do something catastrophic.

If removeApplication fails then the NM will proceed as if it did succeed but 
the state store will still have the application present.  This should be 
corrected when the NM finishes application processing (per above or if it was 
already recorded as finished) and it will again try to remove it from the state 
store.  As above I think there could be some unnecessary work performed, but I 
think in the end the application should eventually be removed from the NM on 
restart.  It could still remain in the state store if the second removal also 
fails, but a subsequent restart should behave the same.

bq. Do we need special warning if get failed on deserializing credential here?

I'm not sure how credential processing is fundamentally all that different from 
protocol buffer parsing which could also fail.  If the credentials can't be 
read then we can't recover the application.  Currently recovery errors are 
fatal to NM startup.  Do you have something specific in mind for handling the 
credentials if the writable changes (e.g.: some pseudo code to show the 
approach)?

> Recover applications upon nodemanager restart
> -
>
> Key: YARN-1354
> URL: https://issues.apache.org/jira/browse/YARN-1354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1354-v1.patch, 
> YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, 
> YARN-1354-v4.patch, YARN-1354-v5.patch
>
>
> The set of active applications in the nodemanager context need to be 
> recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077144#comment-14077144
 ] 

Hadoop QA commented on YARN-2360:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658291/YARN-2360-v1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4463//console

This message is automatically generated.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-07-28 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077125#comment-14077125
 ] 

Ashwin Shankar commented on YARN-2360:
--

Attached screenshot and patch for UI changes to display dynamic fair share.
Some comments on UI changes :
1. I'm calling dynamic fair share as "Current Fair Share" and static fair share 
as "Guaranteed Fair Share".
2. Since dynamic fair share is a "temporary fair share", I've represented it as 
a "dashed" border.
3. Changed static fair share border to have a "solid" border rather than 
"dashed". 
4. Added Dynamic Fair Share/Current Fair Share to show up on tooltip.
5. Usage changes to Orange, when it goes above dynamic/current fair share 
rather than static fair share.

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts

2014-07-28 Thread Li Lu (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2354:


Attachment: YARN-2354-072814.patch

New patch, added log information. 

> DistributedShell may allocate more containers than client specified after it 
> restarts
> -
>
> Key: YARN-2354
> URL: https://issues.apache.org/jira/browse/YARN-2354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Li Lu
> Attachments: YARN-2354-072514.patch, YARN-2354-072814.patch
>
>
> To reproduce, run distributed shell with -num_containers option,
> In ApplicationMaster.java, the following code has some issue.
> {code}
>   int numTotalContainersToRequest =
> numTotalContainers - previousAMRunningContainers.size();
> for (int i = 0; i < numTotalContainersToRequest; ++i) {
>   ContainerRequest containerAsk = setupContainerAskForRM();
>   amRMClient.addContainerRequest(containerAsk);
> }
> numRequestedContainers.set(numTotalContainersToRequest);
> {code}
>  numRequestedContainers doesn't account for previous AM's requested 
> containers. so numRequestedContainers should be set to numTotalContainers



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-07-28 Thread Ashwin Shankar (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2360:
-

Attachment: Screen Shot 2014-07-28 at 1.12.19 PM.png

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-07-28 Thread Ashwin Shankar (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated YARN-2360:
-

Attachment: YARN-2360-v1.txt

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
> Attachments: Screen Shot 2014-07-28 at 1.12.19 PM.png, 
> YARN-2360-v1.txt
>
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077087#comment-14077087
 ] 

Hadoop QA commented on YARN-2363:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658270/YARN-2363.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4462//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4462//console

This message is automatically generated.

> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2363.patch
>
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2360) Fair Scheduler : Display dynamic fair share for queues on the scheduler page

2014-07-28 Thread Ashwin Shankar (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar reassigned YARN-2360:


Assignee: Ashwin Shankar

> Fair Scheduler : Display dynamic fair share for queues on the scheduler page
> 
>
> Key: YARN-2360
> URL: https://issues.apache.org/jira/browse/YARN-2360
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>
> Based on the discussion in YARN-2026,  we'd like to display dynamic fair 
> share for queues on the scheduler page.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077038#comment-14077038
 ] 

Jian He commented on YARN-2209:
---

Hi Zhijie, thanks for the review.  Here are some responses:
bq. Why is it necessary to use the exception instead of the flag to indicate 
the RM restarting? I
Because as you can see, not just allocate API, unregisterResponse is also 
required to add AMCommand otherwise. Basically, every AMS API other than 
register requires adding a new field otherwise. Throwing exception is much 
cleaner way.
bq. For example, MR of prior versions will no longer work properly with a YARN 
cluster after this patch during RM restarting.
Not matter how application is reacting to the shutdown command, NM will shoot 
down the AM container during RM restart. So prior applications(including MR) 
should still work.  Even earlier MR AM container is possibly killed by NM 
before it actually successfully performs any shutting down logic.
bq. Deprecate the enum type instead of each enum value?
Maybe not deprecating AMCommand, as we may add other commands later on as 
needed.
bq. Why not throwing ApplicationAttemptNotFoundException instead? It sounds 
more reasonable here, doesn’t it?
Do you mean creating a new ApplicationAttemptNotFoundException exception ? I 
think it's fine to just reuse the ApplicationNotFoundException as they are 
quite similar. The internal exception msg shows the attemptId.
bq. Is this change necessary?
It is. because the finally block (i.e. "if(allocateResponse == null)" ) will be 
executed otherwise.
bq. shall we split the patch into two pieces: one for YARN and the other for MR,
will split once review is done. I think it'll be easier to review with both 
side changes having more context.
bq. No need to break it into two lines, right?
will fix it.

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-07-28 Thread Mit Desai (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076984#comment-14076984
 ] 

Mit Desai commented on YARN-2363:
-

patch looks good to me.
+1 (non-binding)

> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2363.patch
>
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-07-28 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2363:
-

Attachment: YARN-2363.patch

Quick patch that generates a default proxy URL if the user has access to the 
app but there isn't a current attempt.

> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
> Attachments: YARN-2363.patch
>
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2366) Speed up history server startup time

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076933#comment-14076933
 ] 

Hadoop QA commented on YARN-2366:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658247/YARN-2366.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4461//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4461//console

This message is automatically generated.

> Speed up history server startup time
> 
>
> Key: YARN-2366
> URL: https://issues.apache.org/jira/browse/YARN-2366
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2366.v1.patch
>
>
> When history server starts up, It scans every history directories and put all 
> history files into a cache, whereas this cache only stores 20K recent history 
> files. Therefore, it is wasting a large portion of time loading old history 
> files into the cache, and the startup time will keep increasing if we don't 
> trim the number of history files. For example, when history server starts up 
> with 2.5M history files in HDFS, it took ~5 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2366) Speed up history server startup time

2014-07-28 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2366:
--

Attachment: YARN-2366.v1.patch

> Speed up history server startup time
> 
>
> Key: YARN-2366
> URL: https://issues.apache.org/jira/browse/YARN-2366
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2366.v1.patch
>
>
> When history server starts up, It scans every history directories and put all 
> history files into a cache, whereas this cache only stores 20K recent history 
> files. Therefore, it is wasting a large portion of time loading old history 
> files into the cache, and the startup time will keep increasing if we don't 
> trim the number of history files. For example, when history server starts up 
> with 2.5M history files in HDFS, it took ~5 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (YARN-2366) Speed up history server startup time

2014-07-28 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li reassigned YARN-2366:
-

Assignee: Siqi Li

> Speed up history server startup time
> 
>
> Key: YARN-2366
> URL: https://issues.apache.org/jira/browse/YARN-2366
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-2366.v1.patch
>
>
> When history server starts up, It scans every history directories and put all 
> history files into a cache, whereas this cache only stores 20K recent history 
> files. Therefore, it is wasting a large portion of time loading old history 
> files into the cache, and the startup time will keep increasing if we don't 
> trim the number of history files. For example, when history server starts up 
> with 2.5M history files in HDFS, it took ~5 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2366) Speed up history server startup time

2014-07-28 Thread Siqi Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2366:
--

Description: When history server starts up, It scans every history 
directories and put all history files into a cache, whereas this cache only 
stores 20K recent history files. Therefore, it is wasting a large portion of 
time loading old history files into the cache, and the startup time will keep 
increasing if we don't trim the number of history files. For example, when 
history server starts up with 2.5M history files in HDFS, it took ~5 minutes.

> Speed up history server startup time
> 
>
> Key: YARN-2366
> URL: https://issues.apache.org/jira/browse/YARN-2366
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>
> When history server starts up, It scans every history directories and put all 
> history files into a cache, whereas this cache only stores 20K recent history 
> files. Therefore, it is wasting a large portion of time loading old history 
> files into the cache, and the startup time will keep increasing if we don't 
> trim the number of history files. For example, when history server starts up 
> with 2.5M history files in HDFS, it took ~5 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2366) Speed up history server startup time

2014-07-28 Thread Siqi Li (JIRA)

Siqi Li created YARN-2366:
-

 Summary: Speed up history server startup time
 Key: YARN-2366
 URL: https://issues.apache.org/jira/browse/YARN-2366
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2365) TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry fails on branch-2

2014-07-28 Thread Mit Desai (JIRA)

Mit Desai created YARN-2365:
---

 Summary: TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry 
fails on branch-2
 Key: YARN-2365
 URL: https://issues.apache.org/jira/browse/YARN-2365
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai


TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry fails on branch with 
the following errror
{noformat}
Running 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 46.471 sec <<< 
FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
testShouldNotCountFailureToMaxAttemptRetry(org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart)
  Time elapsed: 46.354 sec  <<< FAILURE!
java.lang.AssertionError: AppAttempt state is not correct (timedout) 
expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAM(MockRM.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.launchAndRegisterAM(MockRM.java:576)
at 
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart.testShouldNotCountFailureToMaxAttemptRetry(TestAMRestart.java:389)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076769#comment-14076769
 ] 

Hadoop QA commented on YARN-1769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658215/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4460//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4460//console

This message is automatically generated.

> CapacityScheduler:  Improve reservations
> 
>
> Key: YARN-1769
> URL: https://issues.apache.org/jira/browse/YARN-1769
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests 
> for large containers and the fact there might not currently be enough space 
> available on a single host.
> The current algorithm for reservations is to reserve as many containers as 
> currently required and then it will start to reserve more above that after a 
> certain number of re-reservations (currently biased against larger 
> containers).  Anytime it hits the limit of number reserved it stops looking 
> at any other nodes. This results in potentially missing nodes that have 
> enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your 
> queue capacity.  If you have reservations you could hit the various limits 
> which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to 
> take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming 
> nodes to see if we could potentially swap out a reservation for an actual 
> allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2364) TestRMRestart#testRMRestartWaitForPreviousAMToFinish is racy

2014-07-28 Thread Mit Desai (JIRA)

Mit Desai created YARN-2364:
---

 Summary: TestRMRestart#testRMRestartWaitForPreviousAMToFinish is 
racy
 Key: YARN-2364
 URL: https://issues.apache.org/jira/browse/YARN-2364
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai


TestRMRestart#testRMRestartWaitForPreviousAMToFinish is racy. It fails 
intermittently on branch-2 with the following errors.

Fails with any of these
{noformat}
Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 26.836 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 26.687 sec  <<< FAILURE!
java.lang.AssertionError: expected:<4> but was:<3>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:557)
{noformat}

or

{noformat}
Running org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 51.326 sec <<< 
FAILURE! - in org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
testRMRestartWaitForPreviousAMToFinish(org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart)
  Time elapsed: 51.055 sec  <<< FAILURE!
java.lang.AssertionError: AppAttempt state is not correct (timedout) 
expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockAM.waitForState(MockAM.java:82)
at 
org.apache.hadoop.yarn.server.resourcemanager.MockRM.sendAMLaunched(MockRM.java:414)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.launchAM(TestRMRestart.java:949)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testRMRestartWaitForPreviousAMToFinish(TestRMRestart.java:519)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076679#comment-14076679
 ] 

Hadoop QA commented on YARN-415:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12658211/YARN-415.201407281816.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  org.apache.hadoop.yarn.client.TestApplicationMasterServiceOnHA
  org.apache.hadoop.yarn.client.TestRMFailover
  org.apache.hadoop.yarn.client.api.impl.TestAMRMClient
  org.apache.hadoop.yarn.client.api.impl.TestNMClient
  org.apache.hadoop.yarn.client.TestGetGroups
  
org.apache.hadoop.yarn.client.TestResourceManagerAdministrationProtocolPBClientImpl
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.client.cli.TestYarnCLI
  org.apache.hadoop.yarn.client.api.impl.TestYarnClient
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4459//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4459//console

This message is automatically generated.

> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the

[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076607#comment-14076607
 ] 

Hadoop QA commented on YARN-1769:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658198/YARN-1769.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4458//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4458//console

This message is automatically generated.

> CapacityScheduler:  Improve reservations
> 
>
> Key: YARN-1769
> URL: https://issues.apache.org/jira/browse/YARN-1769
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests 
> for large containers and the fact there might not currently be enough space 
> available on a single host.
> The current algorithm for reservations is to reserve as many containers as 
> currently required and then it will start to reserve more above that after a 
> certain number of re-reservations (currently biased against larger 
> containers).  Anytime it hits the limit of number reserved it stops looking 
> at any other nodes. This results in potentially missing nodes that have 
> enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your 
> queue capacity.  If you have reservations you could hit the various limits 
> which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to 
> take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming 
> nodes to see if we could potentially swap out a reservation for an actual 
> allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-07-28 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1769:
--

Attachment: YARN-1769.patch

> CapacityScheduler:  Improve reservations
> 
>
> Key: YARN-1769
> URL: https://issues.apache.org/jira/browse/YARN-1769
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests 
> for large containers and the fact there might not currently be enough space 
> available on a single host.
> The current algorithm for reservations is to reserve as many containers as 
> currently required and then it will start to reserve more above that after a 
> certain number of re-reservations (currently biased against larger 
> containers).  Anytime it hits the limit of number reserved it stops looking 
> at any other nodes. This results in potentially missing nodes that have 
> enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your 
> queue capacity.  If you have reservations you could hit the various limits 
> which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to 
> take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming 
> nodes to see if we could potentially swap out a reservation for an actual 
> allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-415) Capture memory utilization at the app-level for chargeback

2014-07-28 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-415:


Attachment: YARN-415.201407281816.txt

[~leftnoteasy]
Thanks for all of your help. 
How were you thinking an end-to-end test would work in the UT environment? In 
order to set a baseline and test that the containers ran for some predetermined 
and expected amount of time, wouldn't I need to somehow control the clock? Do 
you have any ideas on how to implement that?

In the meantime, I have made the additional changes you suggested. Please see 
below:

{quote}
bq. I was able to remove the rmApps variable, but I had to leave the check for 
app != null because if I try to take that out, several unit tests would fail 
with NullPointerException. Even with removing the rmApps variable, I needed to 
change TestRMContainerImpl.java to mock rmContext.getRMApps().

I would like to suggest to fix such UTs instead of inserting some kernel code 
to make UT pass. I'm not sure about the effort of doing this, if the effort is 
still reasonable, we should do it.
{quote}
After some spy and mock magic, I was able to fix the unit tests so that the 
checks for "if != null" were not necessary.

{quote}
{code}
 ApplicationCLI.java
+  appReportStr.print("\tResources used : ");
{code}
We need change it to Resource Utilization as well?
{quote}
Yes. I changed it to that.


> Capture memory utilization at the app-level for chargeback
> --
>
> Key: YARN-415
> URL: https://issues.apache.org/jira/browse/YARN-415
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Affects Versions: 0.23.6
>Reporter: Kendall Thrapp
>Assignee: Andrey Klochkov
> Attachments: YARN-415--n10.patch, YARN-415--n2.patch, 
> YARN-415--n3.patch, YARN-415--n4.patch, YARN-415--n5.patch, 
> YARN-415--n6.patch, YARN-415--n7.patch, YARN-415--n8.patch, 
> YARN-415--n9.patch, YARN-415.201405311749.txt, YARN-415.201406031616.txt, 
> YARN-415.201406262136.txt, YARN-415.201407042037.txt, 
> YARN-415.201407071542.txt, YARN-415.201407171553.txt, 
> YARN-415.201407172144.txt, YARN-415.201407232237.txt, 
> YARN-415.201407242148.txt, YARN-415.201407281816.txt, YARN-415.patch
>
>
> For the purpose of chargeback, I'd like to be able to compute the cost of an
> application in terms of cluster resource usage.  To start out, I'd like to 
> get the memory utilization of an application.  The unit should be MB-seconds 
> or something similar and, from a chargeback perspective, the memory amount 
> should be the memory reserved for the application, as even if the app didn't 
> use all that memory, no one else was able to use it.
> (reserved ram for container 1 * lifetime of container 1) + (reserved ram for
> container 2 * lifetime of container 2) + ... + (reserved ram for container n 
> * lifetime of container n)
> It'd be nice to have this at the app level instead of the job level because:
> 1. We'd still be able to get memory usage for jobs that crashed (and wouldn't 
> appear on the job history server).
> 2. We'd be able to get memory usage for future non-MR jobs (e.g. Storm).
> This new metric should be available both through the RM UI and RM Web 
> Services REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-07-28 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1769:
--

Attachment: (was: YARN-1769.patch)

> CapacityScheduler:  Improve reservations
> 
>
> Key: YARN-1769
> URL: https://issues.apache.org/jira/browse/YARN-1769
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests 
> for large containers and the fact there might not currently be enough space 
> available on a single host.
> The current algorithm for reservations is to reserve as many containers as 
> currently required and then it will start to reserve more above that after a 
> certain number of re-reservations (currently biased against larger 
> containers).  Anytime it hits the limit of number reserved it stops looking 
> at any other nodes. This results in potentially missing nodes that have 
> enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your 
> queue capacity.  If you have reservations you could hit the various limits 
> which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to 
> take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming 
> nodes to see if we could potentially swap out a reservation for an actual 
> allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076498#comment-14076498
 ] 

Zhijie Shen commented on YARN-2209:
---

[~jianhe], thanks for the patch. Bellow is some meta comments on this issue.

Why is it necessary to use the exception instead of the flag to indicate the RM 
restarting? In general, I'm afraid the changes here mutually break the backward 
compatibility between YARN and MR. On the one side, any YARN applications used 
to have the logic to deal with RM restarting need to be updated after this 
patch. For example, MR of prior versions will no longer work properly with a 
YARN cluster after this patch during RM restarting. The MR job won’t recognize 
the not found exception and take the necessary restarting treatment, but will 
just record the error and move on.

On the other side, if we assume it is possible the new version MR job after 
this patch is going to be run on an old YARN cluster, the MR job will then not 
recognize the old flag-style restarting signal, and thus will not executing the 
MR-side logic to deal with RM restarting. IMHO, at least, the switch block to 
check the AMCommand cannot be removed but deprecated for compatibility 
consideration.

In case we want to proceed with this change, here're some comment on the patch:

1.  MR side change is not trivial. According to our convention before, shall we 
split the patch into two pieces: one for YARN and the other for MR, such that 
we can easily track the changes for different projects.

2. Why not throwing ApplicationAttemptNotFoundException instead? It sounds more 
reasonable here, doesn’t it?

3. Deprecate the enum type instead of each enum value?
{code}
 @Public
 @Unstable
 public enum AMCommand {
{code}

4. The description sounds not accurate enough. It doesn’t just request 
containers. “App Master heartbeat”?
{code}
+public static final String AM_ALLOCATE = "App Master request containers”;
{code}

5. No need to break it into two lines, right?
{code}
 AllocateResponse allocateResponse;
…
+allocateResponse = scheduler.allocate(allocateRequest);
{code}

6.  Is this change necessary?
{code}
-return allocate(progressIndicator);
+allocateResponse = allocate(progressIndicator);
+return allocateResponse;
{code}

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations

2014-07-28 Thread Chen He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-1769:
--

Attachment: YARN-1769.patch

reduce log output when LeafQueue need to unreserve resource frequently.

if (needToUnreserve) {
+  if(LOG.isDebugEnabled()){
  LOG.info("we needed to unreserve to be able to allocate");
+  } 
  return false;
}



> CapacityScheduler:  Improve reservations
> 
>
> Key: YARN-1769
> URL: https://issues.apache.org/jira/browse/YARN-1769
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.3.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, 
> YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch
>
>
> Currently the CapacityScheduler uses reservations in order to handle requests 
> for large containers and the fact there might not currently be enough space 
> available on a single host.
> The current algorithm for reservations is to reserve as many containers as 
> currently required and then it will start to reserve more above that after a 
> certain number of re-reservations (currently biased against larger 
> containers).  Anytime it hits the limit of number reserved it stops looking 
> at any other nodes. This results in potentially missing nodes that have 
> enough space to fullfill the request.   
> The other place for improvement is currently reservations count against your 
> queue capacity.  If you have reservations you could hit the various limits 
> which would then stop you from looking further at that node.  
> The above 2 cases can cause an application requesting a larger container to 
> take a long time to gets it resources.  
> We could improve upon both of those by simply continuing to look at incoming 
> nodes to see if we could potentially swap out a reservation for an actual 
> allocation. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2354) DistributedShell may allocate more containers than client specified after it restarts

2014-07-28 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076439#comment-14076439
 ] 

Li Lu commented on YARN-2354:
-

Same error message as YARN-2295, and could not reproduce locally. Seems like 
this is connected with the network settings of the server, causing the 
following lines to fail
{code}
  if (appReport.getHost().startsWith(hostName)
  && appReport.getRpcPort() == -1) {
verified = true;
  }
{code}
If such check failed, verified will never be set to true, hence the test will 
fail. This failure appears to be unrelated to the problem fixed by this patch. 

> DistributedShell may allocate more containers than client specified after it 
> restarts
> -
>
> Key: YARN-2354
> URL: https://issues.apache.org/jira/browse/YARN-2354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Li Lu
> Attachments: YARN-2354-072514.patch
>
>
> To reproduce, run distributed shell with -num_containers option,
> In ApplicationMaster.java, the following code has some issue.
> {code}
>   int numTotalContainersToRequest =
> numTotalContainers - previousAMRunningContainers.size();
> for (int i = 0; i < numTotalContainersToRequest; ++i) {
>   ContainerRequest containerAsk = setupContainerAskForRM();
>   amRMClient.addContainerRequest(containerAsk);
> }
> numRequestedContainers.set(numTotalContainersToRequest);
> {code}
>  numRequestedContainers doesn't account for previous AM's requested 
> containers. so numRequestedContainers should be set to numTotalContainers



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076405#comment-14076405
 ] 

Hadoop QA commented on YARN-1994:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658153/YARN-1994.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4457//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4457//console

This message is automatically generated.

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, 
> YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2357) Port Windows Secure Container Executor YARN-1063, YARN-1972, YARN-2198 changes to branch-2

2014-07-28 Thread Matt Foley (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated YARN-2357:
-

Target Version/s: 2.6.0

> Port Windows Secure Container Executor YARN-1063, YARN-1972, YARN-2198 
> changes to branch-2
> --
>
> Key: YARN-2357
> URL: https://issues.apache.org/jira/browse/YARN-2357
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Affects Versions: 2.4.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Critical
>  Labels: security, windows
> Attachments: YARN-2357.1.patch
>
>
> As title says. Once YARN-1063, YARN-1972 and YARN-2198 are committed to 
> trunk, they need to be backported to branch-2



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076369#comment-14076369
 ] 

Jian He commented on YARN-2209:
---

Hi [~djp], thanks for the comment.  I think users are expected to handle two 
types of exceptions YarnException and IOException. In that sense, this is 
equivalent to  throwing a new type of exception which should be fine ?

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076319#comment-14076319
 ] 

Craig Welch commented on YARN-1994:
---

TestAMRestart passes on my box, reattached patch to try again on jenkins

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, 
> YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1994:
--

Attachment: YARN-1994.11.patch

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, 
> YARN-1994.4.patch, YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1354) Recover applications upon nodemanager restart

2014-07-28 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076296#comment-14076296
 ] 

Junping Du commented on YARN-1354:
--

Thanks [~jlowe] for updating the patch! A few quick comments so far:
{code}
+try {
+  this.context.getNMStateStore().finishApplication(appID);
+} catch (IOException e) {
+  LOG.error("Unable to update application state in store", e);
+}
{code}
Looks like we only log when persistent effort get failed as we did for other 
components before. In this case, what would happen if storeApplication(), 
finishApplication(), removeApplication() failed with application related 
information get inconsistent after restart?

In ContainerManagerImpl.java
{code}
+  private void recoverApplication(ContainerManagerApplicationProto p)
+  throws IOException {
+ApplicationId appId = new ApplicationIdPBImpl(p.getId());
+Credentials creds = new Credentials();
+creds.readTokenStorageStream(
+new DataInputStream(p.getCredentials().newInput()));
  ...
{code}
Do we need special warning if get failed on deserializing credential here? i.e. 
adding something like version mismatch, etc. It could happen when any changes 
happen in future on credentials object which is a writable object.

More comments will come later.

> Recover applications upon nodemanager restart
> -
>
> Key: YARN-1354
> URL: https://issues.apache.org/jira/browse/YARN-1354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-1354-v1.patch, 
> YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, 
> YARN-1354-v4.patch, YARN-1354-v5.patch
>
>
> The set of active applications in the nodemanager context need to be 
> recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-07-28 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076292#comment-14076292
 ] 

Jason Lowe commented on YARN-2363:
--

Most application submits result in a proxy tracking URL, but occasionally the 
client sees a transient "N/A" URL.  Here's a snippet of Pig client output where 
a MapReduce job was submitted with no tracking URL received:

{noformat}
2014-07-23 19:19:16,658 [JobControl] INFO 
org.apache.hadoop.mapred.ResourceMgrDelegate - Submitted application 
application_1403199204249_357708 to ResourceManager at
xx/xx:xx
2014-07-23 19:19:16,660 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - 
The url to track the job: N/A
{noformat}

I believe this can occur if the client tries to get an application report just 
as the app is submitted.  YarnClientImpl.submitApplication won't return until 
the app is past the NEW_SAVING state, but if the client slips in while the app 
is in the SUBMITTED state then I think we could end up with no tracking URL due 
to the lack of a current attempt.  From RMAppImpl.createAndGetApplicationReport:

{code}
  String trackingUrl = UNAVAILABLE;
  String host = UNAVAILABLE;
  String origTrackingUrl = UNAVAILABLE;
[...]
  if (allowAccess) {
if (this.currentAttempt != null) {
  currentApplicationAttemptId = this.currentAttempt.getAppAttemptId();
  trackingUrl = this.currentAttempt.getTrackingUrl();
  origTrackingUrl = this.currentAttempt.getOriginalTrackingUrl();
{code}

So if we don't have a current attempt we'll return "N/A" as the tracking URL.  
Arguably we should return the proxied URL which will redirect to the RM app 
page if there is no tracking URL set yet so at least the client/user has a URL 
that can be used to track the application.


> Submitted applications occasionally lack a tracking URL
> ---
>
> Key: YARN-2363
> URL: https://issues.apache.org/jira/browse/YARN-2363
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Jason Lowe
>
> Sometimes when an application is submitted the client receives no tracking 
> URL.  More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-2363) Submitted applications occasionally lack a tracking URL

2014-07-28 Thread Jason Lowe (JIRA)

Jason Lowe created YARN-2363:


 Summary: Submitted applications occasionally lack a tracking URL
 Key: YARN-2363
 URL: https://issues.apache.org/jira/browse/YARN-2363
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe


Sometimes when an application is submitted the client receives no tracking URL. 
 More details in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Issue Comment Deleted] (YARN-321) Generic application history service

2014-07-28 Thread Jake Farrell (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Farrell updated YARN-321:
--

Comment: was deleted

(was: Compared to wrists well is less available status amphetamines but higher 
investigations of withdrawal. 
adderall 20 mg 
http://www.surveyanalytics.com//userimages/sub-2/2007589/3153260/29851520/7787428-29851520-stopadd3.html
 
Areas also document any reasons they have surprisingly been using in the 
information.)

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
> Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076226#comment-14076226
 ] 

Hudson commented on YARN-2247:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1818 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1818/])
YARN-2247. Made RM web services authenticate users via kerberos and delegation 
token. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1613821)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMAuthenticationHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokenAuthentication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebappAuthentication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> Allow RM web services users to authenticate using delegation tokens
> ---
>
> Key: YARN-2247
> URL: https://issues.apache.org/jira/browse/YARN-2247
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: YARN-2247.6.patch, apache-yarn-2247.0.patch, 
> apache-yarn-2247.1.patch, apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, 
> apache-yarn-2247.4.patch, apache-yarn-2247.5.patch
>
>
> The RM webapp should allow users to authenticate using delegation tokens to 
> maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076215#comment-14076215
 ] 

Hadoop QA commented on YARN-1994:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658123/YARN-1994.11.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4456//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4456//console

This message is automatically generated.

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, 
> YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076206#comment-14076206
 ] 

Hudson commented on YARN-2247:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1845 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1845/])
YARN-2247. Made RM web services authenticate users via kerberos and delegation 
token. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1613821)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMAuthenticationHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokenAuthentication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebappAuthentication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> Allow RM web services users to authenticate using delegation tokens
> ---
>
> Key: YARN-2247
> URL: https://issues.apache.org/jira/browse/YARN-2247
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: YARN-2247.6.patch, apache-yarn-2247.0.patch, 
> apache-yarn-2247.1.patch, apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, 
> apache-yarn-2247.4.patch, apache-yarn-2247.5.patch
>
>
> The RM webapp should allow users to authenticate using delegation tokens to 
> maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-1994) Expose YARN/MR endpoints on multiple interfaces

2014-07-28 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-1994:
--

Attachment: YARN-1994.11.patch

Fixed bug, YarnConfiguration.getSocketAddr checks in ha cases to see which rm 
it was on, this was no longer active in earlier versions of the patch.  
Simplified logic, removed many unnecessary changes in earlier patch versions, 
added some tests.   With this patch, logic should be to act as before in the 
absence of any bind-host, in the presence of bind-host, only for the listening 
process, port is retrieved from address and used with bind-host to bind, all 
other address/configuration paths should now be unchanged by the patch.

> Expose YARN/MR endpoints on multiple interfaces
> ---
>
> Key: YARN-1994
> URL: https://issues.apache.org/jira/browse/YARN-1994
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, webapp
>Affects Versions: 2.4.0
>Reporter: Arpit Agarwal
>Assignee: Craig Welch
> Attachments: YARN-1994.0.patch, YARN-1994.1.patch, 
> YARN-1994.11.patch, YARN-1994.2.patch, YARN-1994.3.patch, YARN-1994.4.patch, 
> YARN-1994.5.patch, YARN-1994.6.patch, YARN-1994.7.patch
>
>
> YARN and MapReduce daemons currently do not support specifying a wildcard 
> address for the server endpoints. This prevents the endpoints from being 
> accessible from all interfaces on a multihomed machine.
> Note that if we do specify INADDR_ANY for any of the options, it will break 
> clients as they will attempt to connect to 0.0.0.0. We need a solution that 
> allows specifying a hostname or IP-address for clients while requesting 
> wildcard bind for the servers.
> (List of endpoints is in a comment below)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2247) Allow RM web services users to authenticate using delegation tokens

2014-07-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076142#comment-14076142
 ] 

Hudson commented on YARN-2247:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #626 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/626/])
YARN-2247. Made RM web services authenticate users via kerberos and delegation 
token. Contributed by Varun Vasudev. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1613821)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilter.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/security/http/RMAuthenticationFilterInitializer.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMAuthenticationHandler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokenAuthentication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebappAuthentication.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


> Allow RM web services users to authenticate using delegation tokens
> ---
>
> Key: YARN-2247
> URL: https://issues.apache.org/jira/browse/YARN-2247
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Blocker
> Fix For: 2.5.0
>
> Attachments: YARN-2247.6.patch, apache-yarn-2247.0.patch, 
> apache-yarn-2247.1.patch, apache-yarn-2247.2.patch, apache-yarn-2247.3.patch, 
> apache-yarn-2247.4.patch, apache-yarn-2247.5.patch
>
>
> The RM webapp should allow users to authenticate using delegation tokens to 
> maintain parity with RPC.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-321) Generic application history service

2014-07-28 Thread Patrick Morton (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076080#comment-14076080
 ] 

Patrick Morton commented on YARN-321:
-

Compared to wrists well is less available status amphetamines but higher 
investigations of withdrawal. 
adderall 20 mg 
http://www.surveyanalytics.com//userimages/sub-2/2007589/3153260/29851520/7787428-29851520-stopadd3.html
 
Areas also document any reasons they have surprisingly been using in the 
information.

> Generic application history service
> ---
>
> Key: YARN-321
> URL: https://issues.apache.org/jira/browse/YARN-321
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Luke Lu
> Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
> Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java
>
>
> The mapreduce job history server currently needs to be deployed as a trusted 
> server in sync with the mapreduce runtime. Every new application would need a 
> similar application history server. Having to deploy O(T*V) (where T is 
> number of type of application, V is number of version of application) trusted 
> servers is clearly not scalable.
> Job history storage handling itself is pretty generic: move the logs and 
> history data into a particular directory for later serving. Job history data 
> is already stored as json (or binary avro). I propose that we create only one 
> trusted application history server, which can have a generic UI (display json 
> as a tree of strings) as well. Specific application/version can deploy 
> untrusted webapps (a la AMs) to query the application history server and 
> interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076062#comment-14076062
 ] 

Wangda Tan commented on YARN-2008:
--

Hi Craig,
As we discussed in YARN-1198, I think we should consider resource used by a 
queue's siblings when computing headroom, I took a look at your patch again, 
some comments:

We first need think about how to calculate headroom in general, I think 
headroom is (concluded from sub JIRAs of YARN-1198),
{code}
queue_available = min(clusterResource - used_by_sibling_of_parents - 
used_by_this_queue, queue_max_resource)
headroom = min(queue_available - available_resource_in_blacklisted_nodes, 
user_limit)
{code}
So I think this JIRA is focus on computing {{used_by_sibling_of_parents}}, is 
it?

I think the general appoarch looks good to me, except In CSQueueUtils.java, 
(will include review of tests in next iteration):
1) 
{code}
  //sibling used is parent used - my used...
  float siblingUsedCapacity = Resources.ratio(
 resourceCalculator,
 Resources.subtract(parent.getUsedResources(), 
queue.getUsedResources()),
 parentResource);
{code}
It seems to me this computing not robust enough when parent resource is empty, 
no matter it's an zero-capacity queue or sibling of it used 100% of cluster.
It's better to add an edge test case to prevent such zero-division as well.

2)
It's better to explicitly cap {{return absoluteMaxAvail}} in range of \[0~1\] 
to prevent errors float computation.

Thanks,
Wangda

> CapacityScheduler may report incorrect queueMaxCap if there is hierarchy 
> queue structure 
> -
>
> Key: YARN-2008
> URL: https://issues.apache.org/jira/browse/YARN-2008
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.3.0
>Reporter: Chen He
>Assignee: Craig Welch
> Attachments: YARN-2008.1.patch, YARN-2008.2.patch
>
>
> If there are two queues, both allowed to use 100% of the actual resources in 
> the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and 
> there is not actual space available. If we use current method to get 
> headroom, CapacityScheduler thinks there are still available resources for 
> users in Q1 but they have been used by Q2. 
> If the CapacityScheduelr has a hierarchy queue structure, it may report 
> incorrect queueMaxCap. Here is a example
>  ||||rootQueue|| ||
> |  |   /   |  
>   \ |
> |  L1ParentQueue1  |  |
> L1ParentQueue2|
> |  (allowed to use up 80% of its parent)|  | (allowed to use 20% 
> in minimum of its parent)|
> |/   | \ ||  
> |  L2LeafQueue1 |L2LeafQueue2 |  | 
> |(50% of its parent) |  (50% of its parent in minimum) |   |
> When we calculate headroom of a user in L2LeafQueue2, current method will 
> think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. 
> However, without checking L1ParentQueue1, we are not sure. It is possible 
> that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, 
> L2LeafQueue2 can only use 30% (60%*50%). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1291) RM INFO logs limit scheduling speed

2014-07-28 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076059#comment-14076059
 ] 

Varun Saxena commented on YARN-1291:


Hi [~sandyr], I had raised YARN-2287 which is also about printing of too many 
RM Audit logs in critical flow. For this, in the patch, I had added support for 
printing audit logs at different log levels and changed container logs in RM 
and NM to DEBUG. I didnt remove the audit logs as I wasnt sure if these audit 
logs are really required or not.  

> RM INFO logs limit scheduling speed
> ---
>
> Key: YARN-1291
> URL: https://issues.apache.org/jira/browse/YARN-1291
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>
> I've been running some microbenchmarks to see how fast the Fair Scheduler can 
> fill up a cluster and found its performance is significantly hampered by 
> logging.
> I tested with 500 (mock) nodes, and found that:
> * Taking out fair scheduler INFO logs on the critical path brought down the 
> latency from 14000 ms to 6000 ms
> * Taking out the INFO that RMContainerImpl logs when a container transitions 
> brought it down from 6000 ms to 4000 ms
> * Taking out RMAuditLogger logs brought it down from 4000 ms to 1700 ms



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (YARN-2287) Add audit log levels for NM and RM

2014-07-28 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2287:
---

Attachment: YARN-2287-patch-1.patch

> Add audit log levels for NM and RM
> --
>
> Key: YARN-2287
> URL: https://issues.apache.org/jira/browse/YARN-2287
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.4.1
>Reporter: Varun Saxena
> Attachments: YARN-2287-patch-1.patch, YARN-2287.patch
>
>
> NM and RM audit logging can be done based on log level as some of the audit 
> logs, especially the container audit logs appear too many times. By 
> introducing log level, certain audit logs can be suppressed, if not required 
> in deployment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076048#comment-14076048
 ] 

Junping Du commented on YARN-2209:
--

Thanks [~jianhe] for the patch and [~rohithsharma] for review! I think this is 
a reasonable change and patch itself looks good to me. However, I have concern 
that it could break existing YARN applications that run with old version 
ApplicationMasterProtocol which looks forward to a RESYNC command rather than 
an exception in response. More discussions with broadly people in community are 
needed, I think.

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1707) Making the CapacityScheduler more dynamic

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076042#comment-14076042
 ] 

Wangda Tan commented on YARN-1707:
--

Thanks for uploading the patch [~curino], [~subru]. They're great additions to 
current CapacityScheduler. I took a look at your patch,

*First I have a couple of questions about its background, especially 
{{PlanQueue}}/{{ReservationQueue}} in this patch. I think understanding 
background is important for me to get a whole picture of this patch. What I can 
understand is,*
# {{PlanQueue}} can have a normal {{ParentQueue}} as its parent, but all 
children of {{PlanQueue}} can only be {{ReservationQueue}}. Is it possible that 
multiple {{PlanQueue}} exist in the cluster?
# {{PlanQueue}} is initially setup in configuration, as same as 
{{ParentQueue}}, it has absolute capacity, etc. But different from 
{{ParentQueue}}, it has user-limit/user-limit-factor, etc.
# {{ReservationQueue}} is dynamically initialized by PlanFollower, when a new 
reservationId acquired, it will create a new {{ReservationQueue}} accordingly
# {{PlanFollower}} can dynamically adjust queue size of {{ReservationQueue}}s 
to make resource reservation can be satisfied.
# Is it possible that sum of reserved resource exceeds limit of 
{{PlanQueue}}/{{ReservationQeueu}} and preemption triggered?
# How to deal with RM restart? It is possible that RM restart during resource 
reservation, we may need to consider how to persistent such queues

Hope you could share your ideas about them.

*For requirement of this ticket (copied from JIRA),*
# create queues dynamically
# destroy queues dynamically
# dynamically change queue parameters (e.g., capacity)
# modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
instead of ==100%
# move app across queues

I found #1-#3 are dedicated used by {{PlanQueue}}, {{Reservation}}. IMHO, it 
should be better to added them to CapacityScheduler and don't couple them with 
ReservationSystem, but I cannot think about other solid senarios can leverage 
them. I hope to get feedbacks from community before we couple them with 
ReservationSystem. And as mentioned by [~acmurthy], can we merge add queue to 
existing add new queue mechanism?
#4 should be only valid in {{PlanQueue}}. Because if we change this behavior in 
{{ParentQueue}}, it is possible that some careless admin will mis-setting 
capacities of queues under a parent queue, if sum of their capacity don't 
equals to 1, some resource may not be able to be used by applications. 

*Some other comments (Majorly about move app because we may need consider scope 
of create/destory queues first):*
1) I think we need consider how moving apps across queues work with YARN-1368. 
We can change queue of containers from queueA to queueB, but with YARN-1368, 
during RM restart, container will report it is in queueA (we don't sync them to 
NM when do moveApp operation). I hope [~jianhe] could share some thoughts about 
this as well.
2) Move application in CapacityScheduler need call finishApplication in 
resource queue and submitApplication in target queue to make QueueMetrics 
correct. And submitApplication will check ACL of target queue as well.
3) Should we respect MaxApplicationsPerUser in target queue when trying to move 
app? IMHO, we can stop moving app if MaxApplicationsPerUser reached in target 
queue.

Thanks,
Wangda

> Making the CapacityScheduler more dynamic
> -
>
> Key: YARN-1707
> URL: https://issues.apache.org/jira/browse/YARN-1707
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Carlo Curino
>Assignee: Carlo Curino
>  Labels: capacity-scheduler
> Attachments: YARN-1707.patch
>
>
> The CapacityScheduler is a rather static at the moment, and refreshqueue 
> provides a rather heavy-handed way to reconfigure it. Moving towards 
> long-running services (tracked in YARN-896) and to enable more advanced 
> admission control and resource parcelling we need to make the 
> CapacityScheduler more dynamic. This is instrumental to the umbrella jira 
> YARN-1051.
> Concretely this require the following changes:
> * create queues dynamically
> * destroy queues dynamically
> * dynamically change queue parameters (e.g., capacity) 
> * modify refreshqueue validation to enforce sum(child.getCapacity())<= 100% 
> instead of ==100%
> We limit this to LeafQueues. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1631) Container allocation issue in Leafqueue assignContainers()

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076006#comment-14076006
 ] 

Hadoop QA commented on YARN-1631:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625843/Yarn-1631.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4455//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4455//console

This message is automatically generated.

> Container allocation issue in Leafqueue assignContainers()
> --
>
> Key: YARN-1631
> URL: https://issues.apache.org/jira/browse/YARN-1631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.2.0
> Environment: SuSe 11 Linux 
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Yarn-1631.1.patch, Yarn-1631.2.patch
>
>
> Application1 has a demand of 8GB[Map Task Size as 8GB] which is more than 
> Node_1 can handle.
> Node_1 has a size of 8GB and 2GB is used by Application1's AM.
> Hence reservation happened for remaining 6GB in Node_1 by Application1.
> A new job is submitted with 2GB AM size and 2GB task size with only 2 Maps to 
> run.
> Node_2 also has 8GB capability.
> But Application2's AM cannot be launched in Node_2. And Application2 waits 
> longer as only 2 Nodes are available in cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075984#comment-14075984
 ] 

Hadoop QA commented on YARN-2209:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658088/YARN-2209.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4454//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4454//console

This message is automatically generated.

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075973#comment-14075973
 ] 

Rohith commented on YARN-2209:
--

+1 patch looks good to me

> Replace AM resync/shutdown command with corresponding exceptions
> 
>
> Key: YARN-2209
> URL: https://issues.apache.org/jira/browse/YARN-2209
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-2209.1.patch, YARN-2209.2.patch, YARN-2209.3.patch, 
> YARN-2209.4.patch, YARN-2209.5.patch
>
>
> YARN-1365 introduced an ApplicationMasterNotRegisteredException to indicate 
> application to re-register on RM restart. we should do the same for 
> AMS#allocate call also.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-2362) Capacity Scheduler: apps with requests that exceed current capacity can starve pending apps

2014-07-28 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075970#comment-14075970
 ] 

Wangda Tan commented on YARN-2362:
--

I think we should fix this,
{code}
   if (!assignToQueue(clusterResource, required)) {
-return NULL_ASSIGNMENT;
+break;
   }
{code}
The {{return NULL_ASSIGNMENT}} statement means: if an app submitted earlier 
cannot allocate resource in a queue, the rest of apps in the queue cannot 
allocate resource in a queue too.

The {{break}} looks better to me.

And I agree this should be a duplicate of YARN-1631

> Capacity Scheduler: apps with requests that exceed current capacity can 
> starve pending apps
> ---
>
> Key: YARN-2362
> URL: https://issues.apache.org/jira/browse/YARN-2362
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.4.1
>Reporter: Ram Venkatesh
>
> Cluster configuration:
> Total memory: 8GB
> yarn.scheduler.minimum-allocation-mb 256
> yarn.scheduler.capacity.maximum-am-resource-percent 1 (100%, test only config)
> App 1 makes a request for 4.6 GB, succeeds, app transitions to RUNNING state. 
> It subsequently makes a request for 4.6 GB, which cannot be granted and it 
> waits.
> App 2 makes a request for 1 GB - never receives it, so the app stays in the 
> ACCEPTED state for ever.
> I think this can happen in leaf queues that are near capacity.
> The fix is likely in LeafQueue.java assignContainers near line 861, where it 
> returns if the assignment would exceed queue capacity, instead of checking if 
> requests for other active applications can be met.
> {code:title=LeafQueue.java|borderStyle=solid}
>// Check queue max-capacity limit
>if (!assignToQueue(clusterResource, required)) {
> -return NULL_ASSIGNMENT;
> +break;
>}
> {code}
> With this change, the scenario above allows App 2 to start and finish while 
> App 1 continues to wait.
> I have a patch available, but wondering if the current behavior is by design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

79 matches

Mail list logo