[jira] [Commented] (YARN-4845) Upgrade fields of o.a.h.y.api.records.ResourceUtilization from int32 to int64

2016-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203793#comment-15203793
 ] 

Wangda Tan commented on YARN-4845:
--

>From the Java doc:
{code}
ResourceUtilization models the utilization of a set of computer 
resources in the cluster.
{code}
It could be used to track a group of nodes, if so, we may need to mark this as 
a block of 2.8.0 release.

> Upgrade fields of o.a.h.y.api.records.ResourceUtilization from int32 to int64
> -
>
> Key: YARN-4845
> URL: https://issues.apache.org/jira/browse/YARN-4845
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api
>Reporter: Wangda Tan
>
> Similar to YARN-4844, if the ResourceUtilization could track all node's 
> resources instead of single node, we need to make sure the fields should be 
> long instead of int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4845) Upgrade fields of o.a.h.y.api.records.ResourceUtilization from int32 to int64

2016-03-20 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4845:


 Summary: Upgrade fields of o.a.h.y.api.records.ResourceUtilization 
from int32 to int64
 Key: YARN-4845
 URL: https://issues.apache.org/jira/browse/YARN-4845
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan


Similar to YARN-4844, if the ResourceUtilization could track all node's 
resources instead of single node, we need to make sure the fields should be 
long instead of int.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

2016-03-20 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4844:


 Summary: Upgrade fields of o.a.h.y.api.records.Resource from int32 
to int64
 Key: YARN-4844
 URL: https://issues.apache.org/jira/browse/YARN-4844
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api
Reporter: Wangda Tan
Priority: Critical


We use int32 for memory now, if a cluster has 10k nodes, each node has 210G 
memory, we will get a negative total cluster memory.

And another case that easier overflows int32 is: we added all pending resources 
of running apps to cluster's total pending resources. If a problematic app 
requires too much resources (let's say 1M+ containers, each of them has 3G 
containers), int32 will be not enough.

Even if we can cap each app's pending request, we cannot handle the case that 
there're many running apps, each of them has capped but still significant 
numbers of pending resources.

So we may possibly need to upgrade int32 memory field (could include v-cores as 
well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4843) [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to int64

2016-03-20 Thread Wangda Tan (JIRA)
Wangda Tan created YARN-4843:


 Summary: [Umbrella] Revisit YARN ProtocolBuffer int32 usages that 
need to upgrade to int64
 Key: YARN-4843
 URL: https://issues.apache.org/jira/browse/YARN-4843
 Project: Hadoop YARN
  Issue Type: Bug
  Components: api
Reporter: Wangda Tan


This JIRA is to track all int32 usages in YARN's ProtocolBuffer APIs that we 
possibly need to update to int64.

One example is resource API. We use int32 for memory now, if a cluster has 10k 
nodes, each node has 210G memory, we will get a negative total cluster memory.

We may have other fields may need to upgrade from int32 to int64. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203759#comment-15203759
 ] 

Yi Zhou commented on YARN-796:
--

Hi [~Naganarasimha],
If  you finished the jira for 2.6 doc, please kindly posted the ID number for 
me to track and reference. Thanks a lot !

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203757#comment-15203757
 ] 

Yi Zhou commented on YARN-796:
--

Appreciate[~Naganarasimha] [~wangda] for you great help!

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4842) yarn logs command should not require the appOwner argument

2016-03-20 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh updated YARN-4842:

Attachment: YARN-4842.1.patch

> yarn logs command should not require the appOwner argument
> --
>
> Key: YARN-4842
> URL: https://issues.apache.org/jira/browse/YARN-4842
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ram Venkatesh
>Assignee: Ram Venkatesh
> Attachments: YARN-4842.1.patch
>
>
> The yarn logs command is among the most common ways to troubleshoot yarn app 
> failures, especially by an admin.
> Currently if you run the command as a user different from the job owner, the 
> command will fail with a subtle message that it could not find the app under 
> the running user's name. This can be confusing especially to new admins.
> We can figure out the job owner from the app report returned by the RM or the 
> AHS, or, by looking for the app directory using a glob pattern, so in most 
> cases this error can be avoided.
> Question - are there scenarios where users will still need to specify the 
> -appOwner option?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4842) yarn logs command should not require the appOwner argument

2016-03-20 Thread Ram Venkatesh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ram Venkatesh reassigned YARN-4842:
---

Assignee: Ram Venkatesh

> yarn logs command should not require the appOwner argument
> --
>
> Key: YARN-4842
> URL: https://issues.apache.org/jira/browse/YARN-4842
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ram Venkatesh
>Assignee: Ram Venkatesh
>
> The yarn logs command is among the most common ways to troubleshoot yarn app 
> failures, especially by an admin.
> Currently if you run the command as a user different from the job owner, the 
> command will fail with a subtle message that it could not find the app under 
> the running user's name. This can be confusing especially to new admins.
> We can figure out the job owner from the app report returned by the RM or the 
> AHS, or, by looking for the app directory using a glob pattern, so in most 
> cases this error can be avoided.
> Question - are there scenarios where users will still need to specify the 
> -appOwner option?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4842) yarn logs command should not require the appOwner argument

2016-03-20 Thread Ram Venkatesh (JIRA)
Ram Venkatesh created YARN-4842:
---

 Summary: yarn logs command should not require the appOwner argument
 Key: YARN-4842
 URL: https://issues.apache.org/jira/browse/YARN-4842
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Ram Venkatesh


The yarn logs command is among the most common ways to troubleshoot yarn app 
failures, especially by an admin.

Currently if you run the command as a user different from the job owner, the 
command will fail with a subtle message that it could not find the app under 
the running user's name. This can be confusing especially to new admins.

We can figure out the job owner from the app report returned by the RM or the 
AHS, or, by looking for the app directory using a glob pattern, so in most 
cases this error can be avoided.

Question - are there scenarios where users will still need to specify the 
-appOwner option?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-03-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203746#comment-15203746
 ] 

Sunil G commented on YARN-3933:
---

As per existing patch, new liveContainers check is done before below code 
{{FS#completedContainerInternal}}. Pls correct me if am wrong w.r.t FS, 
{{containerCompleted}} need to be processed for those containers which are 
RESERVED too. So with current patch, this scenario may not hit.

{code}
 864 if (rmContainer.getState() == RMContainerState.RESERVED) {
 865   application.unreserve(rmContainer.getReservedPriority(), node);
 866 } else {
{code}


> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203740#comment-15203740
 ] 

Naganarasimha G R commented on YARN-796:


Ok actually i meant the same ... create a document for 2.6.x so that we can ask 
people to refer it (Also many times even i too forget while testing the RC 
cuts). I will raise a jira for the same.

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203738#comment-15203738
 ] 

Wangda Tan commented on YARN-796:
-

[~Naganarasimha],
Since we could possibly update node label features in the future. Instead of 
indicating what is available in each release, I think we should add a node 
label doc for 2.6.x release (we only have doc for 2.7+ releases), which only 
include supported features. 

Thoughts?

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-20 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203721#comment-15203721
 ] 

Naganarasimha G R commented on YARN-796:


Yes [~jameszhouyi], in 2.6.0 this command is not yet supported and the 
documentation which is available is for 2.7.2 and lot more fixes and features 
is expected to come in 2.8.0. If you are planning to experiment this feature 
then 2.7.2 is fine but to use it in production then i would suggest to better 
wait for 2.8.0.
[~wangda], is it required to document what is available as part of 2.6.x ? 

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-03-20 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203723#comment-15203723
 ] 

Wangda Tan commented on YARN-3933:
--

Looked at this issue, it seems only FairScheduler has this issue.

CS already checks this inside FiCaSchedulerApp. Instead of adding this 
separately in CS/FS, I think we can create a common {{completedContainer}} 
method to SchedulerApplicationAttempt. And checks liveContainers map inside the 
common method.

Thoughts? [~kasha]/[~guoshiwei]

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2016-03-20 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203684#comment-15203684
 ] 

Yi Zhou commented on YARN-796:
--

Hi [~Naganarasimha]
It seems the below commands are still not supported in 2.6.0 ?
sudo -u yarn yarn cluster --list-node-labels
Error: Could not find or load main class cluster

> Allow for (admin) labels on nodes and resource-requests
> ---
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.4.1
>Reporter: Arun C Murthy
>Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, 
> Non-exclusive-Node-Partition-Design.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.10.patch, 
> YARN-796.node-label.consolidate.11.patch, 
> YARN-796.node-label.consolidate.12.patch, 
> YARN-796.node-label.consolidate.13.patch, 
> YARN-796.node-label.consolidate.14.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4841) UT/FT tests for RM web UI pages

2016-03-20 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-4841:
---

 Summary: UT/FT tests for RM web UI pages
 Key: YARN-4841
 URL: https://issues.apache.org/jira/browse/YARN-4841
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Reporter: Rohith Sharma K S


RM webapp UI does not have UT/FT test cases at least basic validation. 
Everything depends on actual cluster deployment results.
There should be UT/FT tests for validating RM webapp pages. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-20 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-4002:
--
Attachment: YARN-4002-rwlock-v2.patch

Uploaded YARN-4002-rwlock-v2.patch for an improvement: make the read side 
critical section smaller.
{code}
  this.hostsReadLock.lock();
try {
  hostsList = hostsReader.getHosts();
  excludeList = hostsReader.getExcludedHosts();
} finally {
  this.hostsReadLock.unlock();
}
{code}

As explained by [~rohithsharma], this prevents mixing up old value of 
hostsReader.getHosts() and new value of hostsReader.getExcludedHosts(). And 
this is the only reason someone may prefer rwlock solution than lockless one.

If the mixing up is not thought (for example, by meself) a problem, lockless 
solution is good engouth.

> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock-v2.patch, YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4607) AppAttempt page TotalOutstandingResource Requests table support pagination

2016-03-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203635#comment-15203635
 ] 

Rohith Sharma K S commented on YARN-4607:
-

Overall patch looks good, can you add screenshot of before and after container 
request table?

> AppAttempt page TotalOutstandingResource Requests table support pagination
> --
>
> Key: YARN-4607
> URL: https://issues.apache.org/jira/browse/YARN-4607
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4607.patch
>
>
> Simulate cluster with 10 racks with 100 nodes using sls and of we check the 
> table for Total Outstanding Resource Requests will consume complete page.
> Good to support pagination for the table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2016-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203616#comment-15203616
 ] 

Hadoop QA commented on YARN-4808:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 267 unchanged - 1 fixed = 268 total (was 268) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 39s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 162m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanag

[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent

2016-03-20 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203611#comment-15203611
 ] 

Rohith Sharma K S commented on YARN-4002:
-

Thanks [~leftnoteasy] for the looking at the patch..
I was thought about adding these 2 places readlock, but after looking into 
caller of these 2 methods I felt it is not really required.
# Method {{setDecomissionedNMsMetrics}} is called during service init, so this 
will be called during service initialization. 
# Method {{printConfiguredHosts }} is called during service init and 
refreshNodes.
## Once again, for service init, I do not think we need really acquire readlock.
## For refresh  Node,{{printConfiguredHosts }} is with in the write lock, it is 
safe enough to go without read lock.

As of now, without acquiring read lock would not cause any problem. In future, 
if any new method calling these methods need to think of acquiring read lock.



> make ResourceTrackerService.nodeHeartbeat more concurrent
> -
>
> Key: YARN-4002
> URL: https://issues.apache.org/jira/browse/YARN-4002
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Critical
> Attachments: 0001-YARN-4002.patch, YARN-4002-lockless-read.patch, 
> YARN-4002-rwlock.patch, YARN-4002-v0.patch
>
>
> We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By 
> design the method ResourceTrackerService.nodeHeartbeat should be concurrent 
> enough to scale for large clusters.
> But we have a "BIG" lock in NodesListManager.isValidNode which I think it's 
> unnecessary.
> First, the fields "includes" and "excludes" of HostsFileReader are only 
> updated on "refresh nodes".  All RPC threads handling node heartbeats are 
> only readers.  So RWLock could be used to  alow concurrent access by RPC 
> threads.
> Second, since he fields "includes" and "excludes" of HostsFileReader are 
> always updated by "reference assignment", which is atomic in Java, the reader 
> side lock could just be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4699) Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to change label of a node

2016-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203557#comment-15203557
 ] 

Hadoop QA commented on YARN-4699:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
30s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 58s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 177m 21s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.8.0_74 Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestRMHA |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794426/0002-YARN-4699.patch |
| JIRA Issue | YARN-469

[jira] [Updated] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2016-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4808:
---
Attachment: yarn-4808-2.patch

Rebased patch and made a few more cosmetic changes. 

> SchedulerNode can use a few more cosmetic changes
> -
>
> Key: YARN-4808
> URL: https://issues.apache.org/jira/browse/YARN-4808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4808-1.patch, yarn-4808-2.patch
>
>
> We have made some cosmetic changes to SchedulerNode recently. While working 
> on YARN-4511, realized we could improve it a little more:
> # Remove volatile variables - don't see the need for them being volatile
> # Some methods end up doing very similar things, so consolidating them
> # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity 
> to include the un-utilized resources, and having two totals can be a little 
> confusing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203522#comment-15203522
 ] 

Vinod Kumar Vavilapalli commented on YARN-4837:
---

bq. "DISKS_FAILED" shouldn't be skipped for the reason I mentioned in 
YARN-4576. Also, we cannot simply judge system innocent when hitting memory 
issues.
As [~vvasudev] pointed out [here on 
YARN-4576|https://issues.apache.org/jira/browse/YARN-4576?focusedCommentId=15202664&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15202664],
 the right solution is to have the RM detect bouncing nodes and then not to 
allocate new containers to bouncing nodes until they stabilize.

bq. Also, hide all AM scheduling info/preference from application doesn't make 
sense in long time: AM can ask for resources for its running containers in the 
beginning, but application cannot ask how to place its AM even today which is 
sad to me.
My earlier comment came out a little inaccurate when I said about "hiding 
AM-container-scheduling inside the RM". What I really meant is that any 
automatic scheduling decision coming out of system failures/events should be 
hidden from end-users - just like preemption-handling! We already have 
ResourceRequest as part of AM-launch-context. No reason why we cannot have more 
such things. However, this is different from RM automatically ruling out nodes 
as was done at YARN-2005 and related JIRAs.

bq. YARN-4685 is something fixable and much better than the age without 
blacklist (we do see AM keep launching on bad nodes repeatedly and get stuck in 
many cases). We just need to go ahead to fix YARN-4685.
YARN-4685 happened because of an inappropriate solution to a real problem - we 
should pause going down this route till we figure out the right solution.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching

2016-03-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203519#comment-15203519
 ] 

Vinod Kumar Vavilapalli commented on YARN-4576:
---

bq. To the more general point of nodes switching back and forth from good to 
bad and back - the better solution would be to have the RM detect bouncing 
nodes and then not to allocate new containers to bouncing nodes until they 
stabilize.
[~djp], as [~vvasudev] says, this is a much better solution (dare I say the 
right solution) to deal with flip-flopping nodes and scheduling of all 
containers instead of just for AMs as YARN-2005 did.

bq. 3) App should have their own choices to setup preferred nodes, hosts etc.
[~leftnoteasy] / [~djp], I am not arguing against this at all. In fact, we 
already have ResourceRequest as part of AM-launch-context. No reason why we 
cannot have more such things. However, this is different from RM automatically 
ruling out nodes as was done at YARN-2005 and related JIRAs.

bq. To address this case, container exit code is a better candidate, but I 
agree that it itself is not covering 100% cases or not pointing to exact 
failure.
[~sunilg], I agree this with too - that was Sangjin's point as well. Despite 
explicit handling of known system problems, they will likely be cases that we 
will only be slowly handling over time. My only concern was exposing this as 
part of API already before we learn how this can be used in practice - that was 
my proposal too at 
https://issues.apache.org/jira/browse/YARN-4837?focusedCommentId=15201895&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15201895.

> Enhancement for tracking Blacklist in AM Launching
> --
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: EnhancementAMLaunchingBlacklist.pdf
>
>
> Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM:  
> If AM tried to launch containers on a specific node get failed for several 
> times, AM will blacklist this node in future resource asking. This mechanism 
> works fine for normal containers. However, from our observation on behaviors 
> of several clusters: if this problematic node launch AM failed, then RM could 
> pickup this problematic node to launch next AM attempts again and again that 
> cause application failure in case other functional nodes are busy. In normal 
> case, the customized healthy checker script cannot be so sensitive to mark 
> node as unhealthy when one or two containers get launched failed. 
> After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes 
> who launching AM attempts failed for specific application before will get 
> blacklisted. To get rid of potential risks that all nodes being blacklisted 
> by BlacklistManager, a disable-failure-threshold is involved to stop adding 
> more nodes into blacklist if hit certain ratio already. 
> There are already some enhancements for this AM blacklist mechanism: 
> YARN-4284 is to address the more wider case for AM container get launched 
> failure and YARN-4389 tries to make configuration settings available for 
> change by App to meet app specific requirement. However, there are still 
> several gaps to address more scenarios:
> 1. We may need a global blacklist instead of each app maintain a separated 
> one. The reason is: AM could get more chance to fail if other AM get failed 
> before. A quick example is: in a busy cluster, all nodes are busy except two 
> problematic nodes: node a and node b, app1 already submit and get failed in 
> two AM attempts on a and b. app2 and other apps should wait for other busy 
> nodes rather than waste attempts on these two problematic nodes.
> 2. If AM container failure is recognized as global event instead app own 
> issue, we should consider the blacklist is not a permanent thing but with a 
> specific time window. 
> 3. We could have user defined black list polices to address more possible 
> cases and scenarios, so it reasonable to make blacklist policy pluggable.
> 4. For some test scenario, we could have whitelist mechanism for AM launching.
> 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so 
> far.
> Will try to address all issues here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

2016-03-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203496#comment-15203496
 ] 

Vinod Kumar Vavilapalli commented on YARN-1040:
---

Thanks for the document, [~asuresh]!

[~bikassaha]
 - Our comments crossed on Feb 25, so didn't see yours.
 - Looking at the doc, I can see why it gives the impression of a redesign, but 
it is less of a redesign, and more of adding new functionality that needs new 
semantics.
 - Clearly the new naming makes it look like a lot of new changes for the apps, 
but that is the reality (for apps that want to use this new feature)!
 - We do make most of our decisions on JIRA. We can continue the discussion 
here. If need be, sure, we can send out a note on the dev lists.

So, with that out of the way, let's step back and look at the semantics first 
and foremost and keep out the discussions about renames and the expected level 
of changes for later.

h4. APIs
There are big differences between the two proposals w.r.t the APIs. Even though 
it looks like your proposal earlier assumes that this can be made a localized 
change in the NM side APIs, there are newer semantics that mandate new (and/or 
modified) APIs  on both AM-NM and RM-AM interactions. A couple of them that 
come to my mind
 - *Allocation/container release*: We need two separate mechanisms from AM to 
RM for (a) releasing allocations whole-sale (and thereby kill all running 
containers inside) and (b) kill one or more containers running inside an 
allocation *directly* at the RM - this is an existing feature - because the app 
either doesn't want to open N connections to N nodes in the cluster, or simply 
because the NM is not accessible anymore/in-the-interim.
 - *Allocation/container exit notifications*: The AMs will further be 
interested in two separate back-notifications from the RM (a) is the allocation 
itself released completely by the platform - say due to preemption? (b) or has 
one of the containers running inside the allocation exited and so I have to act 
on it? Remember that this is simply a disambiguation of our existing 
container-exit notification mechanism.

h4. Internals
Internally inside the RM too, the state-machine of the allocation itself is 
different from the containers' life-cycle. For e.g., the containers' life-cycle 
determines the completion notifications that we send across to the AMs and only 
the allocation life-cycle impacts scheduling.

h4. Compatibility for existing apps
What is proposed in the doc as well as the way I originally described it, it is 
definitely backwards compatible. Existing applications do not need a single 
line of change. Only newer versions of applications that desire to use the new 
feature have to use newer APIs - something that is not different from any other 
core YARN feature at all.

h4. Changes for apps that want to use the new feature
Even in your proposal, an app/framework that desires to use the new feature has 
to make non-significant changes in the AM to use this feature correctly
 - generating containerIDs
 - managing the list of containers running inside an allocation
 - managing the outstanding unused portion of an allocation, and incrementally 
launching more and more containers till the allocation is full
 - Containers running under non-reusable allocations do not need an explicit 
signal to the RM for clean up - apps can simply stop the container on the NM 
and everything else gets automatically taken care of. Apps that start using new 
feature on the other hand will *have* to now also explicitly release 
allocations outside of the life-cycle of the containers.
 - We can optionally add auxiliary flags to inform NMs to auto-reap the 
allocation when the last-container dies - only for apps that are okay with this 
-, but either ways the apps need changes to do this as they intend it.
 - Apps will also have to react differently on container-exit notifications and 
allocation-released/preempted notifications.

Given the points above, I don't think we can get away with just an NM side API 
change.

Depending on how much we have to change the APIs, I am willing to go either way 
on the degree of renames in the API surface area. Inside the code base though, 
I think we are better off calling things what they are.

> De-link container life cycle from an Allocation
> ---
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a pr

[jira] [Commented] (YARN-4609) RM Nodes list page takes too much time to load

2016-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203477#comment-15203477
 ] 

Hadoop QA commented on YARN-4609:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
19s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
29s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 5s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 162m 24s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |

[jira] [Updated] (YARN-4699) Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to change label of a node

2016-03-20 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4699:
--
Attachment: 0002-YARN-4699.patch

Attaching new patch with test case. I had to add a sleep because event 
processing was delayed. I will also see whether i can have a better wait 
mechanism.

> Scheduler UI and REST o/p is not in sync when -replaceLabelsOnNode is used to 
> change label of a node
> 
>
> Key: YARN-4699
> URL: https://issues.apache.org/jira/browse/YARN-4699
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 2.7.2
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Critical
> Attachments: 0001-YARN-4699.patch, 0002-YARN-4699.patch, 
> AfterAppFInish-LabelY-Metrics.png, ForLabelX-AfterSwitch.png, 
> ForLabelY-AfterSwitch.png
>
>
> Scenario is as follows:
> a. 2 nodes are available in the cluster (node1 with label "x", node2 with 
> label "y")
> b. Submit an application to node1 for label "x". 
> c. Change node1 label to "y" by using *replaceLabelsOnNode* command.
> d. Verify Scheduler UI for metrics such as "Used Capacity", "Absolute 
> Capacity" etc. "x" still shows some capacity.
> e. Change node1 label back to "x" and verify UI and REST o/p
> Output:
> 1. "Used Capacity", "Absolute Capacity" etc are not decremented once labels 
> is changed for a node.
> 2. UI tab for respective label shows wrong GREEN color in these cases.
> 3. REST o/p is wrong for each label after executing above scenario.
> Attaching screen shots also. This ticket will try to cover UI and REST o/p 
> fix when label is changed runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4809) De-duplicate container completion across schedulers

2016-03-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203416#comment-15203416
 ] 

Sunil G commented on YARN-4809:
---

Yes. That's perfectly fine. Thank You.

> De-duplicate container completion across schedulers
> ---
>
> Key: YARN-4809
> URL: https://issues.apache.org/jira/browse/YARN-4809
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Karthik Kambatla
>Assignee: Sunil G
> Attachments: 0001-YARN-4809.patch
>
>
> CapacityScheduler and FairScheduler implement containerCompleted the exact 
> same way. Duplication across the schedulers can be avoided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4809) De-duplicate container completion across schedulers

2016-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203407#comment-15203407
 ] 

Karthik Kambatla commented on YARN-4809:


[~sunilg] - we might want to wait until YARN-3933 gets in, so we can figure out 
the commonalities better. 

> De-duplicate container completion across schedulers
> ---
>
> Key: YARN-4809
> URL: https://issues.apache.org/jira/browse/YARN-4809
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Karthik Kambatla
>Assignee: Sunil G
> Attachments: 0001-YARN-4809.patch
>
>
> CapacityScheduler and FairScheduler implement containerCompleted the exact 
> same way. Duplication across the schedulers can be avoided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203404#comment-15203404
 ] 

Karthik Kambatla commented on YARN-3933:


The fix here seems benign and should be okay to get in. 

Should we consider adding this check to SchedulerApplicationAttempt or 
FSAppAttempt so any other callers don't do any damage in the future? 

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Lavkesh Lahngir
>Assignee: Shiwei Guo
> Attachments: YARN-3933.001.patch, YARN-3933.002.patch, 
> YARN-3933.003.patch
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4067) available resource could be set negative

2016-03-20 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203401#comment-15203401
 ] 

Karthik Kambatla commented on YARN-4067:


IMO, available resource being negative is misleading. Even if we overcommit 
resources, it needs to transparent to users. This is actually one of the goals 
of YARN-1011. 

With regards to YARN-291, I was under the impression the primary motive of that 
work was to allow modifying the capacity of nodes dynamically. When the 
capacity is reduced on a fully allocated node, we should handle it more 
gracefully. Per YARN-1011 parlance, we should demote these containers to being 
called opportunistic. Sometimes, this might not be possible/allowed and the 
capacity update should fail. We can discuss this more on YARN-291. 

> available resource could be set negative
> 
>
> Key: YARN-4067
> URL: https://issues.apache.org/jira/browse/YARN-4067
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4067.patch
>
>
> as mentioned in YARN-4045 by [~leftnoteasy], available memory could be 
> negative due to reservation, propose to use componentwiseMax to 
> updateQueueStatistics in order to cap negative value to zero



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203396#comment-15203396
 ] 

Hudson commented on YARN-4732:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9479 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9479/])
YARN-4732. *ProcessTree classes have too many whitespace issues (kasha: rev 
7fae4c68e6d599d0c01bb2cb2c8d5e52925b3e1e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestResourceCalculatorProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/WindowsBasedProcessTree.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/util/ProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestProcfsBasedProcessTree.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ResourceCalculatorProcessTree.java


> *ProcessTree classes have too many whitespace issues
> 
>
> Key: YARN-4732
> URL: https://issues.apache.org/jira/browse/YARN-4732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: Gabor Liptak
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.9.0
>
> Attachments: YARN-4732.1.patch
>
>
> *ProcessTree classes have too many whitespace issues - extra newlines between 
> methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4732) *ProcessTree classes have too many whitespace issues

2016-03-20 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-4732:
---
Release Note: 



  was:YARN-4732 Cleanup whitespace issues in *ProcessTree classes


> *ProcessTree classes have too many whitespace issues
> 
>
> Key: YARN-4732
> URL: https://issues.apache.org/jira/browse/YARN-4732
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Karthik Kambatla
>Assignee: Gabor Liptak
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-4732.1.patch
>
>
> *ProcessTree classes have too many whitespace issues - extra newlines between 
> methods, spaces in empty lines etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4576) Enhancement for tracking Blacklist in AM Launching

2016-03-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203371#comment-15203371
 ] 

Sunil G commented on YARN-4576:
---

bq.When YARN detects possible failures, it should blacklist nodes within the 
app (from Sangjin Lee). If AM container of an app fails on a node because of 
node-specific reasons, other containers of the app could fail with the same 
reason. But we shouldn't spread it to other apps because different app has 
different settings. We can do this unless we're confident enough that the two 
apps are very similar in configs.

>> There may be one more scenario here, when application's 2nd or further 
>> appAttempts master container is tried to launch. If this new container 
>> launch falls into same faulty node (first attempt failed here), application 
>> can be failed. I have seen few situations like this. Main problem statement 
>> w.r.t one scenario may looks like this "AM container launch failed because 
>> of some environment issues in this node, so its better to run this AM in 
>> another node". To address this case, container exit code is a better 
>> candidate, but I agree that it itself is not covering 100% cases or not 
>> pointing to exact failure. And I hope that atleast if we could save scenario 
>> like above, it will be good. [~wangda tan], thoughts?

> Enhancement for tracking Blacklist in AM Launching
> --
>
> Key: YARN-4576
> URL: https://issues.apache.org/jira/browse/YARN-4576
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: EnhancementAMLaunchingBlacklist.pdf
>
>
> Before YARN-2005, YARN blacklist mechanism is to track the bad nodes by AM:  
> If AM tried to launch containers on a specific node get failed for several 
> times, AM will blacklist this node in future resource asking. This mechanism 
> works fine for normal containers. However, from our observation on behaviors 
> of several clusters: if this problematic node launch AM failed, then RM could 
> pickup this problematic node to launch next AM attempts again and again that 
> cause application failure in case other functional nodes are busy. In normal 
> case, the customized healthy checker script cannot be so sensitive to mark 
> node as unhealthy when one or two containers get launched failed. 
> After YARN-2005, we can have a BlacklistManager in each RMapp, so those nodes 
> who launching AM attempts failed for specific application before will get 
> blacklisted. To get rid of potential risks that all nodes being blacklisted 
> by BlacklistManager, a disable-failure-threshold is involved to stop adding 
> more nodes into blacklist if hit certain ratio already. 
> There are already some enhancements for this AM blacklist mechanism: 
> YARN-4284 is to address the more wider case for AM container get launched 
> failure and YARN-4389 tries to make configuration settings available for 
> change by App to meet app specific requirement. However, there are still 
> several gaps to address more scenarios:
> 1. We may need a global blacklist instead of each app maintain a separated 
> one. The reason is: AM could get more chance to fail if other AM get failed 
> before. A quick example is: in a busy cluster, all nodes are busy except two 
> problematic nodes: node a and node b, app1 already submit and get failed in 
> two AM attempts on a and b. app2 and other apps should wait for other busy 
> nodes rather than waste attempts on these two problematic nodes.
> 2. If AM container failure is recognized as global event instead app own 
> issue, we should consider the blacklist is not a permanent thing but with a 
> specific time window. 
> 3. We could have user defined black list polices to address more possible 
> cases and scenarios, so it reasonable to make blacklist policy pluggable.
> 4. For some test scenario, we could have whitelist mechanism for AM launching.
> 5. Some minor issues: it sounds like NM reconnect won't refresh blacklist so 
> far.
> Will try to address all issues here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4839) ResourceManager deadlock between RMAppAttemptImpl and SchedulerApplicationAttempt

2016-03-20 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-4839:


 Summary: ResourceManager deadlock between RMAppAttemptImpl and 
SchedulerApplicationAttempt
 Key: YARN-4839
 URL: https://issues.apache.org/jira/browse/YARN-4839
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.8.0
Reporter: Jason Lowe
Priority: Blocker


Hit a deadlock in the ResourceManager as one thread was holding the 
SchedulerApplicationAttempt lock and trying to call 
RMAppAttemptImpl.getMasterContainer while another thread had the 
RMAppAttemptImpl lock and was trying to call 
SchedulerApplicationAttempt.getResourceUsageReport.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4636) Make blacklist tracking policy pluggable for more extensions.

2016-03-20 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203367#comment-15203367
 ] 

Sunil G commented on YARN-4636:
---

As YARN improves in its blacklist/whitelist node functionality, one of the 
major usecase from our end is to save the second/further AM Container launch 
attempts to same failed node (if this is failed in a node due to external 
environment/memory issues). This can really help us. With YARN-2005, we have a 
mechanism in hand. And there were concerns on its strict behavior. Proposal 
made in YARN-4837 helps in straighten things out for immediate 2.8.

 I think YARN-4576 was trying to improve on current YARN-2005 and trying to 
generalize it. As we are going forward, if we are planning for a global 
blacklisting based various type of container exit codes, then policy can be 
helpful assuming that we may have different type of apps. For this scenario, we 
do not have usecases from our end. I checked with [~rohithsharma] and 
[~Naganarasimha Garla] also for this. It will be good if we can 
discuss/retrospect more on *global blacklisting* and its advantages/limitations 
based on current available information from containers exit codes.

> Make blacklist tracking policy pluggable for more extensions.
> -
>
> Key: YARN-4636
> URL: https://issues.apache.org/jira/browse/YARN-4636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4809) De-duplicate container completion across schedulers

2016-03-20 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4809:
--
Attachment: 0001-YARN-4809.patch

Sharing an initial version of patch. Kindly help to check the same.

> De-duplicate container completion across schedulers
> ---
>
> Key: YARN-4809
> URL: https://issues.apache.org/jira/browse/YARN-4809
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Karthik Kambatla
>Assignee: Sunil G
> Attachments: 0001-YARN-4809.patch
>
>
> CapacityScheduler and FairScheduler implement containerCompleted the exact 
> same way. Duplication across the schedulers can be avoided. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4609) RM Nodes list page takes too much time to load

2016-03-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4609:
---
Attachment: 0002-YARN-4609.patch

> RM Nodes list page takes too much time to load
> --
>
> Key: YARN-4609
> URL: https://issues.apache.org/jira/browse/YARN-4609
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4609.patch, 0002-YARN-4609.patch, 7k 
> Nodes.png, sls-jobs.json, sls-nodes.json
>
>
> Configure SLS with 1 NM Nodes
> Check the time taken to load Nodes page
> For loading 10 k Nodes it takes *30 sec*
>  /cluster/nodes
> Chrome :Version 47.0.2526.106 m



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4812) TestFairScheduler#testContinuousScheduling fails intermittently

2016-03-20 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198530#comment-15198530
 ] 

Robert Kanter commented on YARN-4812:
-

+1

> TestFairScheduler#testContinuousScheduling fails intermittently
> ---
>
> Key: YARN-4812
> URL: https://issues.apache.org/jira/browse/YARN-4812
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-4812-1.patch
>
>
> This test has failed in the past, and there seem to be more issues. 
> {noformat}
> java.lang.AssertionError: expected:<2> but was:<1>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:743)
> at org.junit.Assert.assertEquals(Assert.java:118)
> at org.junit.Assert.assertEquals(Assert.java:555)
> at org.junit.Assert.assertEquals(Assert.java:542)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testContinuousScheduling(TestFairScheduler.java:3816)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request

2016-03-20 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198436#comment-15198436
 ] 

Eric Payne commented on YARN-4108:
--

Thanks, [~leftnoteasy]. The patch looks good to me.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --
>
> Key: YARN-4108
> URL: https://issues.apache.org/jira/browse/YARN-4108
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-4108-design-doc-V3.pdf, 
> YARN-4108-design-doc-v1.pdf, YARN-4108-design-doc-v2.pdf, YARN-4108.1.patch, 
> YARN-4108.10.patch, YARN-4108.11.patch, YARN-4108.2.patch, YARN-4108.3.patch, 
> YARN-4108.4.patch, YARN-4108.5.patch, YARN-4108.6.patch, YARN-4108.7.patch, 
> YARN-4108.8.patch, YARN-4108.9.patch, YARN-4108.poc.1.patch, 
> YARN-4108.poc.2-WIP.patch, YARN-4108.poc.3-WIP.patch, 
> YARN-4108.poc.4-WIP.patch
>
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2016-03-20 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-4062:
-
Attachment: YARN-4062-YARN-2928.09.patch


Attaching v9. Thanks [~sjlee0] for the debugging to fix the issue. 

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.04.patch, 
> YARN-4062-YARN-2928.05.patch, YARN-4062-YARN-2928.06.patch, 
> YARN-4062-YARN-2928.07.patch, YARN-4062-YARN-2928.08.patch, 
> YARN-4062-YARN-2928.09.patch, YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch, YARN-4062-feature-YARN-2928.02.patch, 
> YARN-4062-feature-YARN-2928.03.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4831) Recovered containers will be killed after NM stateful restart

2016-03-20 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197895#comment-15197895
 ] 

Siqi Li commented on YARN-4831:
---

When NM does a stateful restart, the ContainerManagerImpl will try to recover 
applications, and containers, and then send out ApplicationFinishEvent to apps 
that in appsState.getFinishedApplications().

The ApplicationFinishEvent could result in newly recovered containers to 
transit from NEW to DONE with a KillOnNewTransition.
We could add an additional check in KillOnNewTransition to avoid killing 
completed containers.

> Recovered containers will be killed after NM stateful restart 
> --
>
> Key: YARN-4831
> URL: https://issues.apache.org/jira/browse/YARN-4831
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
>
> {code}
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1456335621285_0040_01_66 transitioned from NEW to 
> DONE
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service 
>OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1456335621285_0040
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-4766:
-
Attachment: yarn4766.003.patch

> NM should not aggregate logs older than the retention policy
> 
>
> Key: YARN-4766
> URL: https://issues.apache.org/jira/browse/YARN-4766
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: yarn4766.001.patch, yarn4766.002.patch, 
> yarn4766.003.patch
>
>
> When a log aggregation fails on the NM the information is for the attempt is 
> kept in the recovery DB. Log aggregation can fail for multiple reasons which 
> are often related to HDFS space or permissions.
> On restart the recovery DB is read and if an application attempt needs its 
> logs aggregated, the files are scheduled for aggregation without any checks. 
> The log files could be older than the retention limit in which case we should 
> not aggregate them but immediately mark them for deletion from the local file 
> system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy

2016-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200863#comment-15200863
 ] 

Hadoop QA commented on YARN-4766:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
36s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
35s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
58s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 32s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn: patch generated 1 new + 
123 unchanged - 2 fixed = 124 total (was 125) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 54s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 3s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 9s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 35s 
{color} | {color:green} hadoop-yarn-server-nodemanag

[jira] [Updated] (YARN-4831) Recovered containers will be killed after NM stateful restart

2016-03-20 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-4831:
--
Attachment: YARN-4831.v1.patch

> Recovered containers will be killed after NM stateful restart 
> --
>
> Key: YARN-4831
> URL: https://issues.apache.org/jira/browse/YARN-4831
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Siqi Li
> Attachments: YARN-4831.v1.patch
>
>
> {code}
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1456335621285_0040_01_66 transitioned from NEW to 
> DONE
> 2016-03-04 19:43:48,130 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=henkins-service 
>OPERATION=Container Finished - Killed   TARGET=ContainerImpl
> RESULT=SUCCESS  APPID=application_1456335621285_0040
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4609) RM Nodes list page takes too much time to load

2016-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203247#comment-15203247
 ] 

Hadoop QA commented on YARN-4609:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
56s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
18s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
31s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 16s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 patch generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
1s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 7s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 163m 30s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.webapp.TestNodesPage |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.Test

[jira] [Assigned] (YARN-3773) hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable

2016-03-20 Thread Alan Burlison (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Burlison reassigned YARN-3773:
---

Assignee: Alan Burlison

> hadoop-yarn-server-nodemanager's use of Linux /sbin/tc is non-portable
> --
>
> Key: YARN-3773
> URL: https://issues.apache.org/jira/browse/YARN-3773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
> Environment: BSD OSX Solaris Windows Linux
>Reporter: Alan Burlison
>Assignee: Alan Burlison
>
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
>  makes use of the Linux-only executable /sbin/tc 
> (http://lartc.org/manpages/tc.txt)  but there is no corresponding 
> functionality for non-Linux platforms. The code in question also seems to try 
> to execute tc even on platforms where it will never exist.
> Other platforms provide similar functionality, e.g. Solaris has an extensive 
> range of network management features 
> (http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-095-s11-app-traffic-525038.html).
>  Work is needed to abstract the network management features of Yarn so that 
> the same facilities for network management can be provided on all platforms 
> that provide the requisite functionality,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4837) User facing aspects of 'AM blacklisting' feature need fixing

2016-03-20 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15201895#comment-15201895
 ] 

Vinod Kumar Vavilapalli commented on YARN-4837:
---

[~sunilg] and [~sjlee0],

Appreciate your feedback.
 - Yes, AMs going to 'bad' nodes again and again and failing is a real problem. 
There are multiple reasons as to why this happens.
-- It is true we cannot enumerate *all* the reasons.
-- It is also true that we have some reasons that we *can* already deal 
with explicitly.
 - The primary reason for this JIRA is that I actually don't believe that users 
need explicit control *today* on how the AM scheduling on faults (i.e 
[~sunilg]'s agreement above - "agreeing to your point and its early for user to 
take blacklisting decisions w/o having much needed/useful information")
 - Like I also mentioned, it is misnamed too. So, let me just call it 
_AM-container-scheduling_ for the time being.

h4. Modified proposal
So how about we
 - Completely keep _AM-container-scheduling_ inside the ResourceManager and 
don't expose any user-APIs to skip-nodes
 - Explicitly treat known exit-codes:
|DISKS_FAILED| node is already unhealthy, no need for any skipping nodes|
|PREEMPTED, KILLED_BY_RESOURCEMANAGER, KILLED_AFTER_APP_COMPLETION| Not the app 
or the system's fault, it's by design, no need for skipping nodes|
|KILLED_EXCEEDED_VMEM, KILLED_EXCEEDED_PMEM| No point in skipping the node as 
it's not the system's fault|
|KILLED_BY_APPMASTER|Cannot happen for AM container|
|All other non-zero codes|Need some action|
 - And book-keep all other failure cases and do soft-skipping *only* on the 
server-side. By this I refer to something similar to node->rack locality 
progression - avoid this node for a few scheduling opportunities and then come 
back to it after waiting out enough time. This way no node gets locked out, nor 
does any app get stuck.

If we just do this, we will take care of our most important problem - apps 
getting affected due to AMs going repeatedly to the same places. And we also 
(a) won't force our users to already make these decisions without really 
understanding how and (b) won't introduce the bad problems of 'blacklisting' 
that exists today - for e.g YARN-4685.

h4. 2.8.0
Even if we don't yet reach the consensus on the above or a similar proposal, I 
feel strongly that we should remove these user-facing configs / APIs from 2.8.0.

Thoughts?

/cc [~vvasudev], [~jianhe], [~wangda] who may not be looking at this.

> User facing aspects of 'AM blacklisting' feature need fixing
> 
>
> Key: YARN-4837
> URL: https://issues.apache.org/jira/browse/YARN-4837
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Was reviewing the user-facing aspects that we are releasing as part of 2.8.0.
> Looking at the 'AM blacklisting feature', I see several things to be fixed 
> before we release it in 2.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-4838) TestLogAggregationService. testLocalFileDeletionOnDiskFull failed

2016-03-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned YARN-4838:


Assignee: Haibo Chen

> TestLogAggregationService. testLocalFileDeletionOnDiskFull failed
> -
>
> Key: YARN-4838
> URL: https://issues.apache.org/jira/browse/YARN-4838
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: log-aggregation
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
> testLocalFileDeletionOnDiskFull failed
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.verifyLocalFileDeletion(TestLogAggregationService.java:232)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService.testLocalFileDeletionOnDiskFull(TestLogAggregationService.java:288)
> The failure is caused by the time issue of DeletionService. DeletionService 
> runs its only thread pool to delete files. When verfiyLocalFileDeletion() 
> method checks file existence, it is possible that the FileDeletionTask has 
> been executed by the thread pool in DeletionService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4833) For Queue AccessControlException client retries multiple times on both RM

2016-03-20 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203211#comment-15203211
 ] 

Bibin A Chundatt commented on YARN-4833:


[~sunilg]
Thank you for looking into the issue. 

Actually by Point 2 was thinking  to throw RPC.getRemoteException which is 
YarnException.
Will try the same and upload a patch soon

> For Queue AccessControlException client retries multiple times on both RM
> -
>
> Key: YARN-4833
> URL: https://issues.apache.org/jira/browse/YARN-4833
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Submit application to queue where ACL is enabled and submitted user is not  
> having access. Client retries till failMaxattempt 10 times.
> {noformat}
> 16/03/18 10:01:06 INFO retry.RetryInvocationHandler: Exception while invoking 
> submitApplication of class ApplicationClientProtocolPBClientImpl over rm1. 
> Trying to fail over immediately.
> org.apache.hadoop.security.AccessControlException: User hdfs does not have 
> permission to submit application_1458273884145_0001 to queue default
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:380)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:291)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:618)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:252)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2360)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2356)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2356)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80)
> at 
> org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119)
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.submitApplication(ApplicationClientProtocolPBClientImpl.java:272)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:257)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
> at com.sun.proxy.$Proxy23.submitApplication(Unknown Source)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:261)
> at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:295)
> at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:301)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:244)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1338)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1338)
> at org.apache.hadoop.mapreduce.Job.waitForC

[jira] [Assigned] (YARN-4746) yarn web services should convert parse failures of appId to 400

2016-03-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-4746:
--

Assignee: Bibin A Chundatt

> yarn web services should convert parse failures of appId to 400
> ---
>
> Key: YARN-4746
> URL: https://issues.apache.org/jira/browse/YARN-4746
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Bibin A Chundatt
>Priority: Minor
> Attachments: 0001-YARN-4746.patch, 0002-YARN-4746.patch, 
> 0003-YARN-4746.patch, 0003-YARN-4746.patch, 0004-YARN-4746.patch
>
>
> I'm seeing somewhere in the WS API tests of mine an error with exception 
> conversion of  a bad app ID sent in as an argument to a GET. I know it's in 
> ATS, but a scan of the core RM web services implies a same problem
> {{WebServices.parseApplicationId()}} uses {{ConverterUtils.toApplicationId}} 
> to convert an argument; this throws IllegalArgumentException, which is then 
> handled somewhere by jetty as a 500 error.
> In fact, it's a bad argument, which should be handled by returning a 400. 
> This can be done by catching the raised argument and explicitly converting it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Updated] (YARN-4609) RM Nodes list page takes too much time to load

2016-03-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4609:
---
Attachment: 7k Nodes.png

[~rohithsharma]/[~devaraj.k]
Check the current implementation with 7 K nodes and time taken in ~1-2 secs.

> RM Nodes list page takes too much time to load
> --
>
> Key: YARN-4609
> URL: https://issues.apache.org/jira/browse/YARN-4609
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4609.patch, 7k Nodes.png, sls-jobs.json, 
> sls-nodes.json
>
>
> Configure SLS with 1 NM Nodes
> Check the time taken to load Nodes page
> For loading 10 k Nodes it takes *30 sec*
>  /cluster/nodes
> Chrome :Version 47.0.2526.106 m



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4609) RM Nodes list page takes too much time to load

2016-03-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4609:
---
Attachment: sls-nodes.json
sls-jobs.json

> RM Nodes list page takes too much time to load
> --
>
> Key: YARN-4609
> URL: https://issues.apache.org/jira/browse/YARN-4609
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4609.patch, sls-jobs.json, sls-nodes.json
>
>
> Configure SLS with 1 NM Nodes
> Check the time taken to load Nodes page
> For loading 10 k Nodes it takes *30 sec*
>  /cluster/nodes
> Chrome :Version 47.0.2526.106 m



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4609) RM Nodes list page takes too much time to load

2016-03-20 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4609:
---
Attachment: 0001-YARN-4609.patch

Attaching patch . Please do review

> RM Nodes list page takes too much time to load
> --
>
> Key: YARN-4609
> URL: https://issues.apache.org/jira/browse/YARN-4609
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Attachments: 0001-YARN-4609.patch
>
>
> Configure SLS with 1 NM Nodes
> Check the time taken to load Nodes page
> For loading 10 k Nodes it takes *30 sec*
>  /cluster/nodes
> Chrome :Version 47.0.2526.106 m



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4736) Issues with HBaseTimelineWriterImpl

2016-03-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15198175#comment-15198175
 ] 

Sangjin Lee commented on YARN-4736:
---

Yes, sorry I meant HBASE-15436.

I think we're more OK with the situation where the entire HBase cluster is down 
or the master is down. That's a critical situation, and all bets are off at 
that point.

My concern is more if one region server went down or is in a state where it 
times out writes and your {{BufferedMutatorImpl}} needs to flush to it. If that 
flush operation times out after 30+ minutes, that would be a significant 
problem. [~anoop.hbase], would things take 30+ minutes to time out if a region 
server (rather than the cluster itself) is down or misbehaving? Thoughts?

> Issues with HBaseTimelineWriterImpl
> ---
>
> Key: YARN-4736
> URL: https://issues.apache.org/jira/browse/YARN-4736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Vrushali C
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: NM_Hang_hbase1.0.3.tar.gz, hbaseException.log, 
> threaddump.log
>
>
> Faced some issues while running ATSv2 in single node Hadoop cluster and in 
> the same node had launched Hbase with embedded zookeeper.
> # Due to some NPE issues i was able to see NM was trying to shutdown, but the 
> NM daemon process was not completed due to the locks.
> # Got some exception related to Hbase after application finished execution 
> successfully. 
> will attach logs and the trace for the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Race condition when calling AbstractYarnScheduler.completedContainer.

2016-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203136#comment-15203136
 ] 

Hadoop QA commented on YARN-3933:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
50s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
32s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 17s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_74. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 2s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 39s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_74 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:fbe3e86 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12794386/YARN-3933.003.patch |
| JIRA Issue | YARN-3933 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs