[jira] [Updated] (YARN-6770) [Docs] A small mistake in the example of TimelineClient

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-6770:

Fix Version/s: (was: 2.9)
   (was: 2.8.3)
   2.8.2
   2.9.0

> [Docs] A small mistake in the example of TimelineClient
> ---
>
> Key: YARN-6770
> URL: https://issues.apache.org/jira/browse/YARN-6770
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: docs
>Reporter: Jinjiang Ling
>Assignee: Jinjiang Ling
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6770.patch
>
>
> I'm trying to use timeline client, then I copy the 
> [example|http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/TimelineServer.html#Publishing_of_application_specific_data]
>  into my application.
> But there is a small mistake here:
> {quote}
> myDomain.*_setID_*("MyDomain");
> .
> myEntity.*_setEntityID_*("MyApp1")
> {quote}
> The correct one should be 
> {quote}
> myDomain.*_setId_*("MyDomain");
> .
> myEntity._*setEntityId*_("MyApp1");
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6809) Fix typo in ResourceManagerHA.md

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-6809:

Fix Version/s: (was: 2.8.3)
   2.8.2

> Fix typo in ResourceManagerHA.md
> 
>
> Key: YARN-6809
> URL: https://issues.apache.org/jira/browse/YARN-6809
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Yeliang Cang
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6809.001.patch
>
>
> {noformat:title=ResourceManagerHA.md}
> ### Recovering prevous active-RM's state
> {noformat}
> prevous should be previous.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2113) Add cross-user preemption within CapacityScheduler's leaf-queue

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-2113:

Fix Version/s: (was: 2.8.3)
   2.8.2

> Add cross-user preemption within CapacityScheduler's leaf-queue
> ---
>
> Key: YARN-2113
> URL: https://issues.apache.org/jira/browse/YARN-2113
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Sunil G
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: IntraQueue Preemption-Impact Analysis.pdf, 
> TestNoIntraQueuePreemptionIfBelowUserLimitAndDifferentPrioritiesWithExtraUsers.txt,
>  YARN-2113.0001.patch, YARN-2113.0002.patch, YARN-2113.0003.patch, 
> YARN-2113.0004.patch, YARN-2113.0005.patch, YARN-2113.0006.patch, 
> YARN-2113.0007.patch, YARN-2113.0008.patch, YARN-2113.0009.patch, 
> YARN-2113.0010.patch, YARN-2113.0011.patch, YARN-2113.0012.patch, 
> YARN-2113.0013.patch, YARN-2113.0014.patch, YARN-2113.0015.patch, 
> YARN-2113.0016.patch, YARN-2113.0017.patch, YARN-2113.0018.patch, 
> YARN-2113.0019.patch, YARN-2113.apply.onto.0012.ericp.patch, 
> YARN-2113.branch-2.0019.patch, YARN-2113.branch-2.0020.patch, 
> YARN-2113.branch-2.0021.patch, YARN-2113.branch-2.8.0019.patch, 
> YARN-2113.branch-2.8.0020.patch, YARN-2113 Intra-QueuePreemption 
> Behavior.pdf, YARN-2113.v0.patch
>
>
> Preemption today only works across queues and moves around resources across 
> queues per demand and usage. We should also have user-level preemption within 
> a queue, to balance capacity across users in a predictable manner.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6428) Queue AM limit is not honored in CS always

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-6428:

Fix Version/s: (was: 2.9)
   (was: 2.8.3)
   2.8.2
   2.9.0

> Queue AM limit is not honored  in CS always
> ---
>
> Key: YARN-6428
> URL: https://issues.apache.org/jira/browse/YARN-6428
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6428.0001.patch, YARN-6428.0002.patch, 
> YARN-6428.0003.patch, YARN-6428-branch-2.8.0003.patch
>
>
> Steps to reproduce
> 
> Setup cluster with 40 GB and 40 vcores with 4 Node managers with 10 GB each.
> Configure 100% to default queue as capacity and max am limit as 10 %
> Minimum scheduler memory and vcore as 512,1
> *Expected* 
> AM limit 4096 and 4 vores
> *Actual*
> AM limit 4096+512 and 4+1 vcore



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6844) AMRMClientImpl.checkNodeLabelExpression() has wrong error message

2017-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099466#comment-16099466
 ] 

Hudson commented on YARN-6844:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12051 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12051/])
YARN-6844. AMRMClientImpl.checkNodeLabelExpression() has wrong error (templedf: 
rev 4c40cd451cbdbce5d2b94ad0e7e3cc991c3439c5)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/AMRMClientImpl.java


> AMRMClientImpl.checkNodeLabelExpression() has wrong error message
> -
>
> Key: YARN-6844
> URL: https://issues.apache.org/jira/browse/YARN-6844
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Manikandan R
>Priority: Minor
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6844.001.patch
>
>
> It says, "Cannot specify more than two node labels in a single node label 
> expression," bit it should say that you can't have more than *one*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6779) DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() should be @VisibleForTesting

2017-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099467#comment-16099467
 ] 

Hudson commented on YARN-6779:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12051 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12051/])
YARN-6779. (templedf: rev bb30bd3771442df253cbe55c448379580bd5ad07)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/DominantResourceFairnessPolicy.java


> DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares()
>  should be @VisibleForTesting
> 
>
> Key: YARN-6779
> URL: https://issues.apache.org/jira/browse/YARN-6779
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Yeliang Cang
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6779-001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5146) [YARN-3368] Supports Fair Scheduler in new YARN UI

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099457#comment-16099457
 ] 

Hadoop QA commented on YARN-5146:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}  0m 56s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-5146 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12878674/YARN-5146.004.patch |
| Optional Tests |  asflicense  |
| uname | Linux 9203c8cfdaa1 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / c98201b |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16534/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-3368] Supports Fair Scheduler in new YARN UI
> --
>
> Key: YARN-5146
> URL: https://issues.apache.org/jira/browse/YARN-5146
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Abdullah Yousufi
> Attachments: YARN-5146.001.patch, YARN-5146.002.patch, 
> YARN-5146.003.patch, YARN-5146.004.patch
>
>
> Current implementation in branch YARN-3368 only support capacity scheduler,  
> we want to make it support fair scheduler. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6855) CLI Proto Modifications to support Node Attributes

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099409#comment-16099409
 ] 

Sunil G edited comment on YARN-6855 at 7/25/17 2:50 AM:


Thanks [~naganarasimha...@apache.org] for the effort.

Few comments
In +{{NodeAttribute}}+, +{{NodeAttributeType}}+, +{{NodeIdToAttributes}}+ and 
+{{NodesToAttributesMappingRequest}}+
# I think lets make this class as Unstable from Evolving as its a new api 
itself. In the course, we can make to Evolving.
# please add more java doc.
# I think its too early to place Stable for NodeAttributeType. Since its public 
and its an enum, is its ok if we mark interface stability with 
Unstable/Evolving to start with.

In +{{NodesToAttributesMappingRequest}}+ and 
+{{yarn_server_resourcemanager_service_protos.proto}}+
# I think {{operation}} could be an enum here. String may be too generic and 
complex to do type checks.

In general
# {{NodeAttributePBImpl#equals}} has some duplicate code.
# in {{NodesToAttributesMappingRequestPBImpl}}, {{operation}} needs some change 
if above comments is accepted.

I will take a second look and will share comments if any.




was (Author: sunilg):
Thanks [~naganarasimha...@apache.org] for the effort.

Few comments
In +{{NodeAttribute}}+, +{{NodeAttributeType}}+, +{{NodeIdToAttributes}}+ and 
+{{NodesToAttributesMappingRequest}}+
# I think lets make this class as Unstable from Evolving as its a new api 
itself. In the course, we can make to Evolving.
# please add more java doc.
# I think its too early to place Stable for NodeAttributeType. Since its public 
and its an enum, is its ok if we mark interface stability with 
Unstable/Evolving to start with.

In +{{NodesToAttributesMappingRequest}}+ and 
+{{yarn_server_resourcemanager_service_protos.proto}}+
# I think {{operation}} could be an enum here. String may be too generic and 
complex to do type checks.

In general
# {{NodeAttributePBImpl#equals}} has some duplicate code.
# 



> CLI Proto Modifications to support Node Attributes
> --
>
> Key: YARN-6855
> URL: https://issues.apache.org/jira/browse/YARN-6855
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6855-YARN-3409.001.patch, 
> YARN-6855-YARN-3409.002.patch, YARN-6855-YARN-3409.003.patch
>
>
> This jira focuses only on the proto modifications required for the CLI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6855) CLI Proto Modifications to support Node Attributes

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099409#comment-16099409
 ] 

Sunil G commented on YARN-6855:
---

Thanks [~naganarasimha...@apache.org] for the effort.

Few comments
In +{{NodeAttribute}}+, +{{NodeAttributeType}}+, +{{NodeIdToAttributes}}+ and 
+{{NodesToAttributesMappingRequest}}+
# I think lets make this class as Unstable from Evolving as its a new api 
itself. In the course, we can make to Evolving.
# please add more java doc.
# I think its too early to place Stable for NodeAttributeType. Since its public 
and its an enum, is its ok if we mark interface stability with 
Unstable/Evolving to start with.

In +{{NodesToAttributesMappingRequest}}+ and 
+{{yarn_server_resourcemanager_service_protos.proto}}+
# I think {{operation}} could be an enum here. String may be too generic and 
complex to do type checks.

In general
# {{NodeAttributePBImpl#equals}} has some duplicate code.
# 



> CLI Proto Modifications to support Node Attributes
> --
>
> Key: YARN-6855
> URL: https://issues.apache.org/jira/browse/YARN-6855
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: YARN-6855-YARN-3409.001.patch, 
> YARN-6855-YARN-3409.002.patch, YARN-6855-YARN-3409.003.patch
>
>
> This jira focuses only on the proto modifications required for the CLI



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099402#comment-16099402
 ] 

Naganarasimha G R commented on YARN-3409:
-

Thanks [~wangda] & [~kkaranasos] for sharing your comments,
bq. NodeIdToAttributesProto actually be NodeToAttributesProto
agree, wanted to further discuss on this, Inside the proto i was sending across 
NodeId, I think we can change it to string, Also even if user specify the 
hostname/IP:port format do we need to consider picking "hostname/IP" only, 
thoughts ?

bq. It was not clear to me how the newly added node attributes are going to 
play with existing node labels. Is the plan to share some code or will it be 
completely separate?
Agree with Wangda's reply on this, there are still lot of differences even in 
the way we are going use CommonNodeLabelManager for a Attribute and a 
Partition. Will try to bring the hierarchy in it so that some common things can 
be reused.

bq. The main part of "2. API proto changes" should be read as proposal#1, and 
"Alternate Proposal 1"/"Alternate Proposal 2" should be read as proposal #2 and 
#3.
Yep, was trying to capture it as alternates to the proposed proposal. 

bq. is the plan to use the new constraints API we are introducing in YARN-6593?
Yes, Its same as per the earlier discussion, but did not get a chance to 
completely review it. Will have a look at it and if any issues will point out 
there.

bq. In the CLI API the replace and update seem a bit confusing to me ...
Agree with [~wangda]'s point, and my preference would be {{add}}

bq. Sounds a little ambiguous as it does not directly look like the existing 
attributes on the node will be removed, but we can make this clear in the 
description of the command.
Agree, i think i have captured in the CLI patch and if required will update more

bq. node1:att1=val1 looks better than node1=att1:val1.
IMHO i would prefer the latter one itself for following reasons 
1. It might be general scenario to specify multiple mapping of attribute value 
pairs to a given node and it would be less input for users. 
2. With earlier notation, port was specified after the {{":"}} before {{"="}} 
for labels, so it would less intuitive for users to use the new format  
3. With attribute type it might not sound very effective. like 
{{node1:att1=val1:type}} or {{node1:att1:type=val1}} 




> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099399#comment-16099399
 ] 

Sunil G commented on YARN-3409:
---

Thanks [~kkaranasos] for comments.

Some quick thoughts here.
bq.is the plan to use the new constraints API we are introducing in YARN-6593?
Yes, you are correct. We will be using new api set from YARN-6593. There are 
some minor thoughts in relation with that, i ll add same in YARN-6593.

bq.In the CLI API the replace and update seem a bit confusing to me ...
Adding some more thoughts. In line with [~leftnoteasy] comments, {{replace}} is 
a single operation, since it is a superset of {{remove}} and {{remove+add}}. We 
support this in existing node labels with a similar command but syntax is 
sometime confusing. And I think, and as also mentioned above, {{add}} / 
{{remove}} will be an operation which may be more happening in system. 
{{replace}} might be happening when we take system to maintenance mode and do 
some upgrades. I think more descriptive help seems a very nice addition and 
better documentation. for cli, we ll handle both.

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6733) Add table for storing sub-application entities

2017-07-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099398#comment-16099398
 ] 

Rohith Sharma K S commented on YARN-6733:
-

Thanks [~vrushalic] for detailed explanation. I am fine to keep to it.

+1 LGTM. I will commit it later of today if no more objections. 

> Add table for storing sub-application entities
> --
>
> Key: YARN-6733
> URL: https://issues.apache.org/jira/browse/YARN-6733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, 
> YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, 
> YARN-6733-YARN-5355.004.patch, YARN-6733-YARN-5355.005.patch, 
> YARN-6733-YARN-5355.006.patch, YARN-6733-YARN-5355.007.patch, 
> YARN-6733-YARN-5355.008.patch
>
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> This jira tracks the code changes needed for  table schema creation.
> I will file other jiras for writing to that table, updating the user name 
> fields to include sub-application user etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6779) DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() should be @VisibleForTesting

2017-07-24 Thread Yeliang Cang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099391#comment-16099391
 ] 

Yeliang Cang commented on YARN-6779:


Thank you for your review, [~templedf]!

> DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares()
>  should be @VisibleForTesting
> 
>
> Key: YARN-6779
> URL: https://issues.apache.org/jira/browse/YARN-6779
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Yeliang Cang
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: YARN-6779-001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6593) [API] Introduce Placement Constraint object

2017-07-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-6593:

Target Version/s: 3.0.0-beta1
   Fix Version/s: (was: 3.0.0-alpha3)
 Description: Just removed Fixed version and moved it to target version 
as we set fix version only after patch is committed.  (was: This JIRA 
introduces an object for defining placement constraints.)

> [API] Introduce Placement Constraint object
> ---
>
> Key: YARN-6593
> URL: https://issues.apache.org/jira/browse/YARN-6593
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-6593.001.patch, YARN-6593.002.patch, 
> YARN-6593.003.patch, YARN-6593.004.patch
>
>
> Just removed Fixed version and moved it to target version as we set fix 
> version only after patch is committed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6413) Decouple Yarn Registry API from ZK

2017-07-24 Thread Ellen Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099352#comment-16099352
 ] 

Ellen Hui edited comment on YARN-6413 at 7/25/17 1:10 AM:
--

bq. looks like RegistryUtils#extractServiceRecords is removed. Do you mind 
change RegistryDNSServer to use the new methods ? (i.e. make this patch compile 
with yarn-native-services branch)

Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that 
depends on having the hierarchical namespace, which is why it was removed. I 
can add a select-multiple method with a filter of some sort, would that work? 
The point of this from our end was to abstract away the path.

Right now yarn-native-services doesn't compile for me with a pom.xml error 
(pulled just now), is the branch healthy?

bq. The DNS today depends on the ZK path layout to reconstruct the service 
record. So changing the zk path will break DNS.

Ok, I will put the ZK path layout back to the way it was before.





was (Author: ellenfkh):
bq:looks like RegistryUtils#extractServiceRecords is removed. Do you mind 
change RegistryDNSServer to use the new methods ? (i.e. make this patch compile 
with yarn-native-services branch)

Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that 
depends on having the hierarchical namespace, which is why it was removed. I 
can add a select-multiple method with a filter of some sort, would that work? 
The point of this from our end was to abstract away the path.

Right now yarn-native-services doesn't compile for me with a pom.xml error 
(pulled just now), is the branch healthy?

bq:The DNS today depends on the ZK path layout to reconstruct the service 
record. So changing the zk path will break DNS.

Ok, I will put the ZK path layout back to the way it was before.




> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK

2017-07-24 Thread Ellen Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099352#comment-16099352
 ] 

Ellen Hui commented on YARN-6413:
-

bq: looks like RegistryUtils#extractServiceRecords is removed. Do you mind 
change RegistryDNSServer to use the new methods ? (i.e. make this patch compile 
with yarn-native-services branch)

Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that 
depends on having the hierarchical namespace, which is why it was removed. I 
can add a select-multiple method with a filter of some sort, would that work? 
The point of this from our end was to abstract away the path.

Right now yarn-native-services doesn't compile for me with a pom.xml error 
(pulled just now), is the branch healthy?

bq: The DNS today depends on the ZK path layout to reconstruct the service 
record. So changing the zk path will break DNS.

Ok, I will put the ZK path layout back to the way it was before.




> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6413) Decouple Yarn Registry API from ZK

2017-07-24 Thread Ellen Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099352#comment-16099352
 ] 

Ellen Hui edited comment on YARN-6413 at 7/25/17 1:09 AM:
--

bq:looks like RegistryUtils#extractServiceRecords is removed. Do you mind 
change RegistryDNSServer to use the new methods ? (i.e. make this patch compile 
with yarn-native-services branch)

Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that 
depends on having the hierarchical namespace, which is why it was removed. I 
can add a select-multiple method with a filter of some sort, would that work? 
The point of this from our end was to abstract away the path.

Right now yarn-native-services doesn't compile for me with a pom.xml error 
(pulled just now), is the branch healthy?

bq:The DNS today depends on the ZK path layout to reconstruct the service 
record. So changing the zk path will break DNS.

Ok, I will put the ZK path layout back to the way it was before.





was (Author: ellenfkh):
bq: looks like RegistryUtils#extractServiceRecords is removed. Do you mind 
change RegistryDNSServer to use the new methods ? (i.e. make this patch compile 
with yarn-native-services branch)

Ah, I see. RegistryUtils#extractServiceRecords is one of the methods that 
depends on having the hierarchical namespace, which is why it was removed. I 
can add a select-multiple method with a filter of some sort, would that work? 
The point of this from our end was to abstract away the path.

Right now yarn-native-services doesn't compile for me with a pom.xml error 
(pulled just now), is the branch healthy?

bq: The DNS today depends on the ZK path layout to reconstruct the service 
record. So changing the zk path will break DNS.

Ok, I will put the ZK path layout back to the way it was before.




> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK

2017-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099345#comment-16099345
 ] 

Jian He commented on YARN-6413:
---

bq. I did not remove any methods, I just didn't add the ones that exist in 
yarn-native-services but not trunk. I thought doing so would probably cause 
more conflicts than not.
looks like RegistryUtils#extractServiceRecords is removed.  Do you mind change 
RegistryDNSServer to use the new methods ? (i.e. make this patch compile with 
yarn-native-services branch)

bq. My understanding was that DNS went through ZK directly without going 
through the interface, so it wouldn't be affected by the service records 
setting up the path differently. I can change the path construction back for 
the ZK impl if it needs that.
The DNS  today depends on the ZK path layout to reconstruct the service record. 
So changing the zk path will break DNS.

> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099341#comment-16099341
 ] 

Konstantinos Karanasos commented on YARN-3409:
--

Yeah, I see there are important differences between node labels and attributes.
It would be nice to unify them at some point, but I do see that this will 
require significantly more effort.
That said, I think we should indeed not do proposal #2 or #3, as it will be 
confusing to share protobufs without sharing further functionality...

{{add}} or {{set}} are fine.
OK with keeping {{replace}}. Sounds a little ambiguous as it does not directly 
look like the existing attributes on the node will be removed, but we can make 
this clear in the description of the command.

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-24 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099328#comment-16099328
 ] 

YunFan Zhou commented on YARN-6802:
---

[~yufeigu] Thank Yufei, I will add Max AM Resource in this JIRA later.

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> YARN-6802.001.patch
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-2.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (YARN-6862) Nodemanager resource usage metrics sometimes are negative

2017-07-24 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-6862:
--
Comment: was deleted

(was: [~jlowe]  It is very likely that the process is exists, but the resource 
usage especially the used CPU is very problematic.
I think we should fix it.)

> Nodemanager resource usage metrics sometimes are negative
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative

2017-07-24 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099321#comment-16099321
 ] 

YunFan Zhou commented on YARN-6862:
---

[~jlowe]  It is very likely that the process is exists, but the resource usage 
especially the used CPU is very problematic.
I think we should fix it.

> Nodemanager resource usage metrics sometimes are negative
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6804) Allow custom hostname for docker containers in native services

2017-07-24 Thread Billie Rinaldi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-6804:
-
Attachment: YARN-6804-trunk.005.patch

I believe the attached patch fixes the enforcer error. It excludes the new 
hadoop-yarn-registry transitive dependency in the hadoop-client-minicluster pom.

> Allow custom hostname for docker containers in native services
> --
>
> Key: YARN-6804
> URL: https://issues.apache.org/jira/browse/YARN-6804
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: yarn-native-services
>
> Attachments: YARN-6804-trunk.004.patch, YARN-6804-trunk.005.patch, 
> YARN-6804-yarn-native-services.001.patch, 
> YARN-6804-yarn-native-services.002.patch, 
> YARN-6804-yarn-native-services.003.patch, 
> YARN-6804-yarn-native-services.004.patch, 
> YARN-6804-yarn-native-services.005.patch
>
>
> Instead of the default random docker container hostname, we could set a more 
> user-friendly hostname for the container. The default could be a hostname 
> based on the container ID, with an option for the AM to provide a different 
> hostname. In the case of the native services AM, we could provide the 
> hostname that would be created by the registry DNS server. Regardless of 
> whether or not registry DNS is enabled, this would be a more useful hostname 
> for the docker container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099292#comment-16099292
 ] 

Wangda Tan commented on YARN-3409:
--

[~kkaranasos],

Add my thoughts to your questions, [~naganarasimha...@apache.org] please add 
yours if you think differently. 

bq. It was not clear to me how the newly added node attributes are going to 
play with existing node labels. Is the plan to share some code or will it be 
completely separate?
There're still many differences between partition and attribute, for example, 
we don't need queue-acl for node-attribute, and after revisit existing 
implementation, we may not need to support multiple NM launched on the same 
host with different ports as well.
It may share some basic implementations (like node-label-manager), however for 
the API level it might be better to have a separate node-attribute-protocol, 
since adding them to NodeLabelProto looks too complex.
The main part of "2. API proto changes" should be read as proposal#1, and 
"Alternate Proposal 1"/"Alternate Proposal 2" should be read as proposal #2 and 
#3.

bq. Re: how applications will be specifying node attribute constraints.
I think so, right? +Naga.

bq. In the CLI API the replace and update seem a bit confusing to me ...
Regarding to semantics of node attribute CLI, I think we all agree with 
{{update}} (adding new constraints or replacing the value of existing ones). 
Instead of calling it {{update}} or {{set}}, how about call it {{add}} (which 
overwrite value if key presents)?
I suggest to keep {{replace}} as a single operation, since it is a superset of 
{{remove}} and {{remove+add}}, which we can provide atomic op as well. Also, I 
think it might be more frequently used by end users instead of a plain 
{{remove}} (what is the scenario we need to clean up all attribute on a node?).
{{node1:att1=val1}} looks better than {{node1=att1:val1}}.

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Konstantinos Karanasos (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099277#comment-16099277
 ] 

Konstantinos Karanasos commented on YARN-3409:
--

Hi guys,

Nice to see you are resuming work on this.

I just checked the latest design document. Here are a couple of questions:
* It was not clear to me how the newly added node attributes are going to play 
with existing node labels. Is the plan to share some code or will it be 
completely separate? I feel that there should be some unification. Not sure I 
understand the two alternatives you mention ("alternate proposal 1 & 2") 
compared to the solution you are proposing instead.
* Re: how applications will be specifying node attribute constraints, is the 
plan to use the new constraints API we are introducing in YARN-6593?
* In the CLI API the replace and update seem a bit confusing to me. Update is 
essentially adding new constraints or replacing the value of existing ones -- 
we could also call it set (and even have an extra parameter that determines if 
we override). Replace is about removing all existing ones and then adding new 
-- we could do it in two steps maybe? Also, I think it's more intuitive to do 
"node1:att1=val1" instead of "node1=att1:val1".

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6733) Add table for storing sub-application entities

2017-07-24 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099207#comment-16099207
 ] 

Vrushali C edited comment on YARN-6733 at 7/24/17 11:28 PM:


So we thought that it will be good to keep the column name so that sub apps can 
store this information. For regular applications, the flow version can be used 
to determine whether optimizations are to be done. The flow version indicates 
if the flow has changed, that is, say if the pig script changes, its flow 
version will change. So then, for example, reducer estimation calculations can 
be done differently. This applies to the application entities. We discussed 
that it will be good to keep the same information for sub-apps in case they 
want to use this information in a similar fashion. As such, this column 
currently only exists in code, it's not taking up any disk space/hbase space 
etc if no one writes to it. But having it gives the framework developers a 
chance to use it if they want. 


was (Author: vrushalic):
So we thought that it will be good to keep the column name so that sub apps can 
store this information. For regular applications, the flow version can be used 
to determine whether optimizations are to be done. The flow version indicates 
if the flow has changed, that is, say if the pig script changes, it's flow 
version will change. So then, for example, reducer estimation calculations can 
be done differently. This applies to the application entities. We discussed 
that it will be good to keep the same information for sub-apps in case they 
want to use this information in a similar fashion. As such, this column 
currently only exists in code, it's not taking up any disk space/hbase space 
etc if no one writes to it. But having it given the framework developers a 
chance to use it if they want. 

> Add table for storing sub-application entities
> --
>
> Key: YARN-6733
> URL: https://issues.apache.org/jira/browse/YARN-6733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, 
> YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, 
> YARN-6733-YARN-5355.004.patch, YARN-6733-YARN-5355.005.patch, 
> YARN-6733-YARN-5355.006.patch, YARN-6733-YARN-5355.007.patch, 
> YARN-6733-YARN-5355.008.patch
>
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> This jira tracks the code changes needed for  table schema creation.
> I will file other jiras for writing to that table, updating the user name 
> fields to include sub-application user etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate

2017-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099257#comment-16099257
 ] 

Wangda Tan commented on YARN-6610:
--

Thanks [~templedf] for working on the patch and comments from 
[~sunilg]/[~yufeigu]. 

Tried to review the patch, it is already outdated, I will do a detailed review 
once the patch is updated. For one thing definitely need to be updated: 
Existing impl creates and uses TreeSet for every compare operation, which could 
be very slow according to testing of YARN-6788 (we expect another patch to 
completely remove map operations in the code path of Resource). I suggest to 
relook at the patch once we fill major performance gaps of YARN-3926 branch.

> DominantResourceCalculator.getResourceAsValue() dominant param is no longer 
> appropriate
> ---
>
> Key: YARN-6610
> URL: https://issues.apache.org/jira/browse/YARN-6610
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-6610.001.patch
>
>
> The {{dominant}} param assumes there are only two resources, i.e. true means 
> to compare the dominant, and false means to compare the subordinate.  Now 
> that there are _n_ resources, this parameter no longer makes sense.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6726) Fix issues with docker commands executed by container-executor

2017-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099233#comment-16099233
 ] 

Wangda Tan commented on YARN-6726:
--

Thanks [~shaneku...@gmail.com] for the patch. 

Discussed with Shane offline, in general the approach looks good, I haven't 
done detailed reviews of code yet. Few comments/questions:

1) Could we do more strict container_id checking, checking string starts with 
container_ might not be enough? Probably you can check the method 
(validate_container_id) I added to YARN-6852. Which we can avoid less malicious 
kill container, etc.

2) {{LOGFILE flush}}, I'm not quite sure about this item, could you elaborate?

3) Regarding to comment from [~chris.douglas], 

bq. We also need to prevent the yarn user from becoming root  ... 
If we can limit docker command only apply to containers launched by YARN (which 
we can use strict container_id pattern matching to identify that), it should be 
already much better than what we have today. We can implement other options 
such as enable/disable component, dynamic load libraries, etc. along with 
YARN-5673. 

> Fix issues with docker commands executed by container-executor
> --
>
> Key: YARN-6726
> URL: https://issues.apache.org/jira/browse/YARN-6726
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
> Attachments: YARN-6726.001.patch
>
>
> docker inspect, rm, stop, etc are issued through container-executor. Commands 
> other than docker run are not functioning properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6866) Minor clean-up and fixes in anticipation of merge with trunk

2017-07-24 Thread Subru Krishnan (JIRA)
Subru Krishnan created YARN-6866:


 Summary: Minor clean-up and fixes in anticipation of merge with 
trunk
 Key: YARN-6866
 URL: https://issues.apache.org/jira/browse/YARN-6866
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: federation
Reporter: Subru Krishnan
Assignee: Botong Huang


We have done e2e testing of YARN Federation sucessfully and we have minor 
cleans-up like pom version upgrade, redudant "." in configuration string, 
documentation updates etc which we want to clean up before the merge to trunk. 
This jira tracks the fixes we did as described above to ensure proper e2e run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6733) Add table for storing sub-application entities

2017-07-24 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099207#comment-16099207
 ] 

Vrushali C commented on YARN-6733:
--

So we thought that it will be good to keep the column name so that sub apps can 
store this information. For regular applications, the flow version can be used 
to determine whether optimizations are to be done. The flow version indicates 
if the flow has changed, that is, say if the pig script changes, it's flow 
version will change. So then, for example, reducer estimation calculations can 
be done differently. This applies to the application entities. We discussed 
that it will be good to keep the same information for sub-apps in case they 
want to use this information in a similar fashion. As such, this column 
currently only exists in code, it's not taking up any disk space/hbase space 
etc if no one writes to it. But having it given the framework developers a 
chance to use it if they want. 

> Add table for storing sub-application entities
> --
>
> Key: YARN-6733
> URL: https://issues.apache.org/jira/browse/YARN-6733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
> Attachments: IMG_7040.JPG, YARN-6733-YARN-5355.001.patch, 
> YARN-6733-YARN-5355.002.patch, YARN-6733-YARN-5355.003.patch, 
> YARN-6733-YARN-5355.004.patch, YARN-6733-YARN-5355.005.patch, 
> YARN-6733-YARN-5355.006.patch, YARN-6733-YARN-5355.007.patch, 
> YARN-6733-YARN-5355.008.patch
>
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> This jira tracks the code changes needed for  table schema creation.
> I will file other jiras for writing to that table, updating the user name 
> fields to include sub-application user etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare

2017-07-24 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099195#comment-16099195
 ] 

Yufei Gu commented on YARN-6307:


The overhead of this refactoring would be 
1. More check on variable {{res}}, at most 4 more checking which possible need 
4-8 more CPU instructions, which is fine. 
2. Overhead of function invoking. JVM does inline methods. 
https://stackoverflow.com/questions/2096361/are-there-inline-functions-in-java

Moreover, this refactor will reduce the computation of fair share comparison, 
since it does fair share ratio computation anyway before my patch while it is 
invoked only if necessary after my patch. In that sense, I don't think we need 
to worry about performance. 

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch, 
> YARN-6307.003.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099193#comment-16099193
 ] 

Wangda Tan commented on YARN-6852:
--

[~chris.douglas], could you help to review the approach and patch if you have 
bandwidth?

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099180#comment-16099180
 ] 

Jian He edited comment on YARN-6031 at 7/24/17 10:15 PM:
-

Ran into this patch when debugging same issue, got few questions:
cc [~sunilg], [~Ying Zhang] 
1. Below code catches InvalidLabelResourceRequestException and assumes that the 
error is because node-label becomes disabled, but the same 
InvalidLabelResourceRequestException can be thrown for other reasons too, right 
? in that case, the following logic becomes invalid. 

{code}
  amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery);
} catch (InvalidLabelResourceRequestException e) {
  // This can happen if the application had been submitted and run
  // with Node Label enabled but recover with Node Label disabled.
  // Thus there might be node label expression in the application's
  // resource requests. If this is the case, create RmAppImpl with
  // null amReq and reject the application later with clear error
  // message. So that the application can still be tracked by RM
  // after recovery and user can see what's going on and react accordingly.
  if (isRecovery &&
  !YarnConfiguration.areNodeLabelsEnabled(this.conf)) {
if (LOG.isDebugEnabled()) {
  LOG.debug("AMResourceRequest is not created for " + applicationId
  + ". NodeLabel is not enabled in cluster, but AM resource "
  + "request contains a label expression.");
}
  } else {
throw e;
  }
{code}

2. Below code directly transitions app to failed by using a Rejected event.  
The attempt state is not moved to failed, it'll be stuck there ? I think we 
need to send KILL event instead of REJECT event
{code}
  if (labelExp != null &&
  !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) {
String message = "Failed to recover application " + appId
+ ". NodeLabel is not enabled in cluster, but AM resource request "
+ "contains a label expression.";
LOG.warn(message);
application.handle(
new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message));
return;
  }
{code}

3. Is it ok to let the app continue in this scenario, it's less disruptive to 
the apps. What's the disadvantage if we let app continue ?


was (Author: jianhe):
Ran into this patch when debugging, got few questions:
cc [~sunilg], [~Ying Zhang] 
1. Below code catches InvalidLabelResourceRequestException and assumes that the 
error is because node-label becomes disabled, but the same 
InvalidLabelResourceRequestException can be thrown for other reasons too, right 
? in that case, the following logic becomes invalid. 

{code}
  amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery);
} catch (InvalidLabelResourceRequestException e) {
  // This can happen if the application had been submitted and run
  // with Node Label enabled but recover with Node Label disabled.
  // Thus there might be node label expression in the application's
  // resource requests. If this is the case, create RmAppImpl with
  // null amReq and reject the application later with clear error
  // message. So that the application can still be tracked by RM
  // after recovery and user can see what's going on and react accordingly.
  if (isRecovery &&
  !YarnConfiguration.areNodeLabelsEnabled(this.conf)) {
if (LOG.isDebugEnabled()) {
  LOG.debug("AMResourceRequest is not created for " + applicationId
  + ". NodeLabel is not enabled in cluster, but AM resource "
  + "request contains a label expression.");
}
  } else {
throw e;
  }
{code}

2. Below code directly transitions app to failed by using a Rejected event.  
The attempt state is not moved to failed, it'll be stuck there ? I think we 
need to send KILL event instead of REJECT event
{code}
  if (labelExp != null &&
  !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) {
String message = "Failed to recover application " + appId
+ ". NodeLabel is not enabled in cluster, but AM resource request "
+ "contains a label expression.";
LOG.warn(message);
application.handle(
new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message));
return;
  }
{code}

3. Is it ok to let the app continue in this scenario, it's less disruptive to 
the apps. What's the disadvantage if we let app continue ?

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: 

[jira] [Comment Edited] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099180#comment-16099180
 ] 

Jian He edited comment on YARN-6031 at 7/24/17 10:08 PM:
-

Ran into this patch when debugging, got few questions:
cc [~sunilg], [~Ying Zhang] 
1. Below code catches InvalidLabelResourceRequestException and assumes that the 
error is because node-label becomes disabled, but the same 
InvalidLabelResourceRequestException can be thrown for other reasons too, right 
? in that case, the following logic becomes invalid. 

{code}
  amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery);
} catch (InvalidLabelResourceRequestException e) {
  // This can happen if the application had been submitted and run
  // with Node Label enabled but recover with Node Label disabled.
  // Thus there might be node label expression in the application's
  // resource requests. If this is the case, create RmAppImpl with
  // null amReq and reject the application later with clear error
  // message. So that the application can still be tracked by RM
  // after recovery and user can see what's going on and react accordingly.
  if (isRecovery &&
  !YarnConfiguration.areNodeLabelsEnabled(this.conf)) {
if (LOG.isDebugEnabled()) {
  LOG.debug("AMResourceRequest is not created for " + applicationId
  + ". NodeLabel is not enabled in cluster, but AM resource "
  + "request contains a label expression.");
}
  } else {
throw e;
  }
{code}

2. Below code directly transitions app to failed by using a Rejected event.  
The attempt state is not moved to failed, it'll be stuck there ? I think we 
need to send KILL event instead of REJECT event
{code}
  if (labelExp != null &&
  !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) {
String message = "Failed to recover application " + appId
+ ". NodeLabel is not enabled in cluster, but AM resource request "
+ "contains a label expression.";
LOG.warn(message);
application.handle(
new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message));
return;
  }
{code}

3. Is it ok to let the app continue in this scenario, it's less disruptive to 
the apps. What's the disadvantage if we let app continue ?


was (Author: jianhe):
Ran into this patch when debugging, got few questions:
cc [~sunilg], [~Ying Zhang] 
1. Below code catches InvalidLabelResourceRequestException and assumes that the 
error is because node-label becomes disabled, but the same 
InvalidLabelResourceRequestException can be thrown for other reasons too, right 
? in that case, the following logic becomes invalid. 

{code}
  amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery);
} catch (InvalidLabelResourceRequestException e) {
  // This can happen if the application had been submitted and run
  // with Node Label enabled but recover with Node Label disabled.
  // Thus there might be node label expression in the application's
  // resource requests. If this is the case, create RmAppImpl with
  // null amReq and reject the application later with clear error
  // message. So that the application can still be tracked by RM
  // after recovery and user can see what's going on and react accordingly.
  if (isRecovery &&
  !YarnConfiguration.areNodeLabelsEnabled(this.conf)) {
if (LOG.isDebugEnabled()) {
  LOG.debug("AMResourceRequest is not created for " + applicationId
  + ". NodeLabel is not enabled in cluster, but AM resource "
  + "request contains a label expression.");
}
  } else {
throw e;
  }
{code}

2. Below code directly transitions app to failed by using a Rejected event.  
The attempt state is not moved to failed, it'll be stuck there ?
{code}
  if (labelExp != null &&
  !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) {
String message = "Failed to recover application " + appId
+ ". NodeLabel is not enabled in cluster, but AM resource request "
+ "contains a label expression.";
LOG.warn(message);
application.handle(
new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message));
return;
  }
{code}

3. Is it ok to let the app continue in this scenario, it's less disruptive to 
the apps. What's the disadvantage if we let app continue ?

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler

[jira] [Commented] (YARN-6031) Application recovery has failed when node label feature is turned off during RM recovery

2017-07-24 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099180#comment-16099180
 ] 

Jian He commented on YARN-6031:
---

Ran into this patch when debugging, got few questions:
cc [~sunilg], [~Ying Zhang] 
1. Below code catches InvalidLabelResourceRequestException and assumes that the 
error is because node-label becomes disabled, but the same 
InvalidLabelResourceRequestException can be thrown for other reasons too, right 
? in that case, the following logic becomes invalid. 

{code}
  amReqs = validateAndCreateResourceRequest(submissionContext, isRecovery);
} catch (InvalidLabelResourceRequestException e) {
  // This can happen if the application had been submitted and run
  // with Node Label enabled but recover with Node Label disabled.
  // Thus there might be node label expression in the application's
  // resource requests. If this is the case, create RmAppImpl with
  // null amReq and reject the application later with clear error
  // message. So that the application can still be tracked by RM
  // after recovery and user can see what's going on and react accordingly.
  if (isRecovery &&
  !YarnConfiguration.areNodeLabelsEnabled(this.conf)) {
if (LOG.isDebugEnabled()) {
  LOG.debug("AMResourceRequest is not created for " + applicationId
  + ". NodeLabel is not enabled in cluster, but AM resource "
  + "request contains a label expression.");
}
  } else {
throw e;
  }
{code}

2. Below code directly transitions app to failed by using a Rejected event.  
The attempt state is not moved to failed, it'll be stuck there ?
{code}
  if (labelExp != null &&
  !labelExp.equals(RMNodeLabelsManager.NO_LABEL)) {
String message = "Failed to recover application " + appId
+ ". NodeLabel is not enabled in cluster, but AM resource request "
+ "contains a label expression.";
LOG.warn(message);
application.handle(
new RMAppEvent(appId, RMAppEventType.APP_REJECTED, message));
return;
  }
{code}

3. Is it ok to let the app continue in this scenario, it's less disruptive to 
the apps. What's the disadvantage if we let app continue ?

> Application recovery has failed when node label feature is turned off during 
> RM recovery
> 
>
> Key: YARN-6031
> URL: https://issues.apache.org/jira/browse/YARN-6031
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: YARN-6031.001.patch, YARN-6031.002.patch, 
> YARN-6031.003.patch, YARN-6031.004.patch, YARN-6031.005.patch, 
> YARN-6031.006.patch, YARN-6031.007.patch, YARN-6031-branch-2.8.001.patch
>
>
> Here is the repro steps:
> Enable node label, restart RM, configure CS properly, and run some jobs;
> Disable node label, restart RM, and the following exception thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.yarn.exceptions.InvalidLabelResourceRequestException: 
> Invalid resource request, node label not enabled but request contains label 
> expression
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:225)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:248)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:394)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:339)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:319)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:436)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1165)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> ... 10 more
> {noformat}
> During RM restart, application recovery failed due to that application had 
> node label expression specified while node label has been disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare

2017-07-24 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099175#comment-16099175
 ] 

Daniel Templeton commented on YARN-6307:


LGTM.  Given the frequency with which this method is called, any performance 
concerns about unrolling the nested _if_ statements?  My hunch is that the 
compiler and/or JIT will make it ultimately irrelevant, but I didn't find 
anything conclusive online.

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch, 
> YARN-6307.003.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare

2017-07-24 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099164#comment-16099164
 ] 

Yufei Gu commented on YARN-6307:


Thanks [~templedf] for the review. Uploaded patch v3 for your comments.

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch, 
> YARN-6307.003.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6307) Refactor FairShareComparator#compare

2017-07-24 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6307:
---
Attachment: YARN-6307.003.patch

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch, 
> YARN-6307.003.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6610) DominantResourceCalculator.getResourceAsValue() dominant param is no longer appropriate

2017-07-24 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099146#comment-16099146
 ] 

Yufei Gu commented on YARN-6610:


Thanks [~templedf] for working on this. Some thoughts:
# I like what you did in method getResourceAsDominantValue().
# The parameter "boolean singleType" is not honored in method 
{{compare(Resource clusterResource, Resource lhs, Resource rhs, boolean 
singleType)}}.
# It may easily to go to this branch with multiple resources, say, only few 
nodes have one type of resource and they went down, which causes resource 
comparison less meaningful since {{compare(Resource lhs, Resource rhs)}} could 
easily return 0. I assume we could use the similar algorithm in 
{{compare(Resource clusterResource, Resource lhs, Resource rhs, boolean 
singleType)}}.
{code}
if (isInvalidDivisor(clusterResource)) {
  return this.compare(lhs, rhs);
}
{code}
# Do you mind add unit tests for {{compare(Resource clusterResource, Resource 
lhs, Resource rhs, boolean singleType)}}? 
# extra space in line "resource.  The share ..."

> DominantResourceCalculator.getResourceAsValue() dominant param is no longer 
> appropriate
> ---
>
> Key: YARN-6610
> URL: https://issues.apache.org/jira/browse/YARN-6610
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: YARN-3926
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
> Attachments: YARN-6610.001.patch
>
>
> The {{dominant}} param assumes there are only two resources, i.e. true means 
> to compare the dominant, and false means to compare the subordinate.  Now 
> that there are _n_ resources, this parameter no longer makes sense.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099137#comment-16099137
 ] 

Wangda Tan commented on YARN-3409:
--

Thanks [~Naganarasimha] and inputs from [~sunilg]. 

The latest API design looks good, for
bq. 2. API proto changes.
I personally prefer adding new {{NodeAttributeProto}} and 
{{NodeToAttributeProto}} instead of changing existing {{NodeLabelProto}}. Also,
 should {{NodeIdToAttributesProto}} actually be {{NodeToAttributesProto}} since 
we don't want to support different ports on the same NM host, correct?

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6307) Refactor FairShareComparator#compare

2017-07-24 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099134#comment-16099134
 ] 

Daniel Templeton commented on YARN-6307:


Nice patch, [~yufeigu].  Here are my comments:

# Since we're unnesting all the _if_ blocks, let's take do that here, too: 
{code}  if (res == 0) {
// Apps are tied in fairness ratio. Break the tie by submit time and job
// name to get a deterministic ordering, which is useful for unit tests.
res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());
if (res == 0) {
  res = s1.getName().compareTo(s2.getName());
}
  }{code}
# Let's not stack the declarations: {code}  double useToWeightRatio1, 
useToWeightRatio2;{code}

Otherwise, looks good.

> Refactor FairShareComparator#compare
> 
>
> Key: YARN-6307
> URL: https://issues.apache.org/jira/browse/YARN-6307
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6307.001.patch, YARN-6307.002.patch
>
>
> The method does three things: compare the min share usage, compare fair share 
> usage by checking weight ratio, break tied by submit time and name. They are 
> mixed with each other which is not easy to read and maintenance, poor style. 
> Additionally, there are potential performance issues, like no need to check 
> weight ratio if minShare usage comparison already indicate the order. It is 
> worth to improve considering huge amount invokings in scheduler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6802) Support view leaf queue am resource usage in RM web ui

2017-07-24 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099121#comment-16099121
 ] 

Yufei Gu commented on YARN-6802:


Thank [~daemon] for working on this. It is definitely useful. I filed YARN-6468 
(Add Max AM Resource and AM Resource Usage to FairScheduler WebUI) several 
months ago, which is similar to this JIRA. Do you mind add Max AM Resource in 
this JIRA and close YARN-6468 as the duplicated one?

> Support view leaf queue am resource usage in RM web ui
> --
>
> Key: YARN-6802
> URL: https://issues.apache.org/jira/browse/YARN-6802
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
> Fix For: 2.8.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> YARN-6802.001.patch
>
>
> RM Web ui should support view leaf queue am resource usage. 
> !screenshot-2.png!
> I will upload my patch later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6865) FSLeafQueue.context should be final

2017-07-24 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6865:
--

 Summary: FSLeafQueue.context should be final
 Key: YARN-6865
 URL: https://issues.apache.org/jira/browse/YARN-6865
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Assignee: Laura Torres
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6864) FSPreemptionThread cleanup for readability

2017-07-24 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6864:
---
Issue Type: Improvement  (was: Bug)

> FSPreemptionThread cleanup for readability
> --
>
> Key: YARN-6864
> URL: https://issues.apache.org/jira/browse/YARN-6864
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-6864.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6845) Variable scheduler of FSLeafQueue duplicates the one of its parent FSQueue.

2017-07-24 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099092#comment-16099092
 ] 

Daniel Templeton commented on YARN-6845:


LGTM +1

> Variable scheduler of FSLeafQueue duplicates the one of its parent FSQueue.
> ---
>
> Key: YARN-6845
> URL: https://issues.apache.org/jira/browse/YARN-6845
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Yufei Gu
>Assignee: Yufei Gu
>Priority: Trivial
> Attachments: YARN-6845.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6864) FSPreemptionThread cleanup for readability

2017-07-24 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6864:
---
Attachment: YARN-6864.001.patch

> FSPreemptionThread cleanup for readability
> --
>
> Key: YARN-6864
> URL: https://issues.apache.org/jira/browse/YARN-6864
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Minor
> Attachments: YARN-6864.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6864) FSPreemptionThread cleanup for readability

2017-07-24 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6864:
--

 Summary: FSPreemptionThread cleanup for readability
 Key: YARN-6864
 URL: https://issues.apache.org/jira/browse/YARN-6864
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Assignee: Daniel Templeton
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6863) Fair Scheduler preemption thread should check that a container has the needed resources before adding it to the preemption list

2017-07-24 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton updated YARN-6863:
---
Description: In {{FSPreemptionThread.identifyContainersToPreemptOnNode()}}, 
we add every container we encounter to the preemption list until we meet the 
desired target.  As we head into resource types, that behavior will become more 
of a problem, but it's technically an issue already because the fair scheduler 
supports requests with 0 vcores or 0 memory.  (was: In 
{{FSPreemptionThread.identifyContainersToPreempt()}}, we add every container we 
encounter to the preemption list until we meet the desired target.  As we head 
into resource types, that behavior will become more of a problem, but it's 
technically an issue already because the fair scheduler supports requests with 
0 vcores or 0 memory.)

> Fair Scheduler preemption thread should check that a container has the needed 
> resources before adding it to the preemption list
> ---
>
> Key: YARN-6863
> URL: https://issues.apache.org/jira/browse/YARN-6863
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.0.0-alpha4
>Reporter: Daniel Templeton
>Priority: Minor
>
> In {{FSPreemptionThread.identifyContainersToPreemptOnNode()}}, we add every 
> container we encounter to the preemption list until we meet the desired 
> target.  As we head into resource types, that behavior will become more of a 
> problem, but it's technically an issue already because the fair scheduler 
> supports requests with 0 vcores or 0 memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6863) Fair Scheduler preemption thread should check that a container has the needed resources before adding it to the preemption list

2017-07-24 Thread Daniel Templeton (JIRA)
Daniel Templeton created YARN-6863:
--

 Summary: Fair Scheduler preemption thread should check that a 
container has the needed resources before adding it to the preemption list
 Key: YARN-6863
 URL: https://issues.apache.org/jira/browse/YARN-6863
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 3.0.0-alpha4
Reporter: Daniel Templeton
Priority: Minor


In {{FSPreemptionThread.identifyContainersToPreempt()}}, we add every container 
we encounter to the preemption list until we meet the desired target.  As we 
head into resource types, that behavior will become more of a problem, but it's 
technically an issue already because the fair scheduler supports requests with 
0 vcores or 0 memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6788) Improve performance of resource profile branch

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099045#comment-16099045
 ] 

Hadoop QA commented on YARN-6788:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3926 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
41s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
30s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
51s{color} | {color:green} YARN-3926 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
59s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3926 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
31s{color} | {color:green} YARN-3926 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  4m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 42s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 13 new + 125 unchanged - 17 fixed = 138 total (was 142) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
5s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
generated 3 new + 0 unchanged - 1 fixed = 3 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  5m 25s{color} 
| {color:red} hadoop-yarn in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
30s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 13s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m  0s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
28s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 31s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
|  |  org.apache.hadoop.yarn.api.records.impl.BaseResource.getResources() may 
expose internal 

[jira] [Updated] (YARN-5146) [YARN-3368] Supports Fair Scheduler in new YARN UI

2017-07-24 Thread Abdullah Yousufi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdullah Yousufi updated YARN-5146:
---
Attachment: YARN-5146.004.patch

> [YARN-3368] Supports Fair Scheduler in new YARN UI
> --
>
> Key: YARN-5146
> URL: https://issues.apache.org/jira/browse/YARN-5146
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Abdullah Yousufi
> Attachments: YARN-5146.001.patch, YARN-5146.002.patch, 
> YARN-5146.003.patch, YARN-5146.004.patch
>
>
> Current implementation in branch YARN-3368 only support capacity scheduler,  
> we want to make it support fair scheduler. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6779) DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares() should be @VisibleForTesting

2017-07-24 Thread Daniel Templeton (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-6779:
--

Assignee: Yeliang Cang  (was: Laura Torres)

> DominantResourceFairnessPolicy.DominantResourceFairnessComparator.calculateShares()
>  should be @VisibleForTesting
> 
>
> Key: YARN-6779
> URL: https://issues.apache.org/jira/browse/YARN-6779
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.8.1, 3.0.0-alpha4
>Reporter: Daniel Templeton
>Assignee: Yeliang Cang
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6779-001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6768) Improve performance of yarn api record toString and fromString

2017-07-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098979#comment-16098979
 ] 

Hudson commented on YARN-6768:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12049 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12049/])
YARN-6768. Improve performance of yarn api record toString and (jlowe: rev 
24853bf32a045b8f029fb136edca2af03836c8d5)
* (add) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/util/TestFastNumberFormat.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ReservationId.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationAttemptId.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationId.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerId.java
* (add) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/FastNumberFormat.java


> Improve performance of yarn api record toString and fromString
> --
>
> Key: YARN-6768
> URL: https://issues.apache.org/jira/browse/YARN-6768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: YARN-6768.1.patch, YARN-6768.2.patch, YARN-6768.3.patch, 
> YARN-6768.4.patch, YARN-6768.5.patch, YARN-6768.6.patch, YARN-6768.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail

2017-07-24 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098959#comment-16098959
 ] 

Suma Shivaprasad commented on YARN-5219:


[~sunilg] Yes I meant "set -o pipefail -e" . Sorry about the typo earlier.

> When an export var command fails in launch_container.sh, the full container 
> launch should fail
> --
>
> Key: YARN-5219
> URL: https://issues.apache.org/jira/browse/YARN-5219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Hitesh Shah
>Assignee: Sunil G
> Attachments: YARN-5219.001.patch, YARN-5219.003.patch, 
> YARN-5219.004.patch, YARN-5219.005.patch, YARN-5219.006.patch, 
> YARN-5219-branch-2.001.patch
>
>
> Today, a container fails if certain files fail to localize. However, if 
> certain env vars fail to get setup properly either due to bugs in the yarn 
> application or misconfiguration, the actual process launch still gets 
> triggered. This results in either confusing error messages if the process 
> fails to launch or worse yet the process launches but then starts behaving 
> wrongly if the env var is used to control some behavioral aspects. 
> In this scenario, the issue was reproduced by trying to do export 
> abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6150) TestContainerManagerSecurity tests for Yarn Server are flakey

2017-07-24 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098922#comment-16098922
 ] 

Ray Chiang commented on YARN-6150:
--

+1

Thanks [~ajisakaa] for digging into this.

> TestContainerManagerSecurity tests for Yarn Server are flakey
> -
>
> Key: YARN-6150
> URL: https://issues.apache.org/jira/browse/YARN-6150
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Daniel Sturman
>Assignee: Daniel Sturman
> Attachments: YARN-6150.001.patch, YARN-6150.002.patch, 
> YARN-6150.003.patch, YARN-6150.004.patch, YARN-6150.005.patch, 
> YARN-6150.006.patch, YARN-6150.007.patch
>
>
> Repeated runs of 
> {{org.apache.hadoop.yarn.server.TestContainerManagedSecurity}} can either 
> pass or fail on repeated runs on the same codebase.  Also, the two runs (one 
> in secure mode, one without security) aren't well labeled in JUnit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6788) Improve performance of resource profile branch

2017-07-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6788:
--
Attachment: YARN-6788-YARN-3926.012.patch

As discussed earlier, I will be suppressing 3 findbugs warnings which are 
related to expose internal representation. But in fact, i return a read only 
array. Doing a copy on every getter call had significant impact on performance.

> Improve performance of resource profile branch
> --
>
> Key: YARN-6788
> URL: https://issues.apache.org/jira/browse/YARN-6788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6788-YARN-3926.001.patch, 
> YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, 
> YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, 
> YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, 
> YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, 
> YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch, 
> YARN-6788-YARN-3926.012.patch
>
>
> Currently we could see a 15% performance delta with this branch. 
> Few performance improvements to improve the same.
> Also this patch will handle 
> [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
>  from [~leftnoteasy].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6413) Decouple Yarn Registry API from ZK

2017-07-24 Thread Ellen Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098902#comment-16098902
 ] 

Ellen Hui commented on YARN-6413:
-

bq. In RegistryDNSServer, all the methods removed are essential to registryDNS 
- the implementation of RegistryDNS is depending on the zookeeper’s listener 
functionality. This is one big difference from the state store implementation. 
we can not remove those methods.

I did not remove any methods, I just didn't add the ones that exist in 
yarn-native-services but not trunk. I thought doing so would probably cause 
more conflicts than not.

bq. The path is used by RegistryDNSServer for reconstructing the DNS 
record(e.g. BaseServiceRecordProcessor#getContainerName). If we change the 
path, everything there will break. Also, the registry documentation needs to 
change.

My understanding was that DNS went through ZK directly without going through 
the interface, so it wouldn't be affected by the service records setting up the 
path differently. I can change the path construction back for the ZK impl if it 
needs that.

> Decouple Yarn Registry API from ZK
> --
>
> Key: YARN-6413
> URL: https://issues.apache.org/jira/browse/YARN-6413
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: amrmproxy, api, resourcemanager
>Reporter: Ellen Hui
>Assignee: Ellen Hui
> Attachments: 0001-Registry-API-v2.patch, 0002-Registry-API-v2.patch
>
>
> Right now the Yarn Registry API (defined in the RegistryOperations interface) 
> is a very thin layer over Zookeeper. This jira proposes changing the 
> interface to abstract away the implementation details so that we can write a 
> FS-based implementation of the registry service, which will be used to 
> support AMRMProxy HA.
> The new interface will use register/delete/resolve APIs instead of 
> Zookeeper-specific operations like mknode. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098784#comment-16098784
 ] 

Hadoop QA commented on YARN-5548:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 18 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 11s{color} | {color:orange} root: The patch generated 7 new + 963 unchanged 
- 8 fixed = 970 total (was 971) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 56s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
45s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}131m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestApplicationCleanup |
|   | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-5548 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12878633/YARN-5548.0016.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ecbc46fe0d0c 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 770cc46 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16532/artifact/patchprocess/diff-checkstyle-root.txt
 |
| unit | 

[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative

2017-07-24 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098634#comment-16098634
 ] 

YunFan Zhou commented on YARN-6862:
---

[~jlowe]  It is very likely that the process is exists, but the resource usage 
especially the used CPU is very problematic.
I think we should fix it.

> Nodemanager resource usage metrics sometimes are negative
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative

2017-07-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098609#comment-16098609
 ] 

Jason Lowe commented on YARN-6862:
--

I believe the case of it returning -1B is when the process exited just as the 
resource monitor was going to examine it.  It's an invalid result because there 
is no process there.  We should not be aggregating those results if that's 
indeed the case.

> Nodemanager resource usage metrics sometimes are negative
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6862) Nodemanager resource usage metrics sometimes are negative

2017-07-24 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098603#comment-16098603
 ] 

YunFan Zhou commented on YARN-6862:
---

[~sunilg] Thanks.

We only can see used memory from NM logs, and from NM logs we can see some logs 
as follows:

2017-07-24 22:19:08,551 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 23933 for container-id 
container_e6717_1500903083707_0014_01_000259: -1B of 1 GB physical memory used; 
-1B of 2.1 GB virtual memory used

Because we collect resource usage metrics direct from MonitoringThread#run 
method, so the metrics is very reliable.


> Nodemanager resource usage metrics sometimes are negative
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6842) Implement a new access type for queue

2017-07-24 Thread YunFan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098594#comment-16098594
 ] 

YunFan Zhou commented on YARN-6842:
---

[~bibinchundatt] 

Thanks bibin, But your solution can not meet our requirements completely.

The shortcomings are as follows:
1. For some users, we may always want them view our applications. 
If we do that by setting ApplicationAccessType#VIEW_APP acl rights in 
containerLaunchContext, 
we should always set it. This is very boring and redundant.
And at other hand,  administrator can authorize VIEW_APP permissions to other 
users.
This privilege is independent of the submitter's authority.
It can be understood that this is an authorization that is different from an
administrator but more than a regular user.

2. It can not authorize users to view a submitted applications.

All in all, we should implement a new access type for queue.

> Implement a new access type for queue
> -
>
> Key: YARN-6842
> URL: https://issues.apache.org/jira/browse/YARN-6842
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
> Fix For: 2.8.2
>
> Attachments: YARN-6842.001.patch, YARN-6842.002.patch, 
> YARN-6842.003.patch
>
>
> When we want to access applications of a queue,  only we can do is become the 
> administer of the queue at present.
> But sometimes we only want  authorize someone view applications of a queue 
> but not modify operation.
> In our current mechanism there isn't any way to meet it, so I will implement 
> a new access type for queue to solve
> this problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6768) Improve performance of yarn api record toString and fromString

2017-07-24 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098567#comment-16098567
 ] 

Jason Lowe commented on YARN-6768:
--

+1 for the latest patch.  The unit test failures are unrelated.  Committing 
this.


> Improve performance of yarn api record toString and fromString
> --
>
> Key: YARN-6768
> URL: https://issues.apache.org/jira/browse/YARN-6768
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-6768.1.patch, YARN-6768.2.patch, YARN-6768.3.patch, 
> YARN-6768.4.patch, YARN-6768.5.patch, YARN-6768.6.patch, YARN-6768.7.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6858) Attribute Manager to store and provide the attributes in RM

2017-07-24 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098541#comment-16098541
 ] 

Arun Suresh commented on YARN-6858:
---

Thanks for raising these [~Naganarasimha],
I was wondering if, instead of a new component, would it be sufficient to add 
the attributes in the {{SchdeulerNode}} itself, and have an interface in the 
{{ClusterNodeTracker}} to query/list nodes via attributes ?

> Attribute Manager to store and provide the attributes in RM
> ---
>
> Key: YARN-6858
> URL: https://issues.apache.org/jira/browse/YARN-6858
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, client
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>
> Similar to CommonNodeLabelsManager we need to have a centralized manager for 
> Node Attributes too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3409) Support Node Attribute functionality

2017-07-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3409:

Attachment: 3409-apiChanges_v2.pdf (4).pdf

Attached the document  for Proto, CLI & REST documentation

> Support Node Attribute functionality
> 
>
> Key: YARN-3409
> URL: https://issues.apache.org/jira/browse/YARN-3409
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, capacityscheduler, client
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: 3409-apiChanges_v2.pdf (4).pdf, 
> Constraint-Node-Labels-Requirements-Design-doc_v1.pdf, YARN-3409.WIP.001.patch
>
>
> Specify only one label for each node (IAW, partition a cluster) is a way to 
> determinate how resources of a special set of nodes could be shared by a 
> group of entities (like teams, departments, etc.). Partitions of a cluster 
> has following characteristics:
> - Cluster divided to several disjoint sub clusters.
> - ACL/priority can apply on partition (Only market team / marke team has 
> priority to use the partition).
> - Percentage of capacities can apply on partition (Market team has 40% 
> minimum capacity and Dev team has 60% of minimum capacity of the partition).
> Attributes are orthogonal to partition, they’re describing features of node’s 
> hardware/software just for affinity. Some example of attributes:
> - glibc version
> - JDK version
> - Type of CPU (x86_64/i686)
> - Type of OS (windows, linux, etc.)
> With this, application can be able to ask for resource has (glibc.version >= 
> 2.20 && JDK.version >= 8u20 && x86_64).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures

2017-07-24 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-5548:
---
Attachment: YARN-5548.0016.patch

Attaching rebase patch again .. 

> Use MockRMMemoryStateStore to reduce test failures
> --
>
> Key: YARN-5548
> URL: https://issues.apache.org/jira/browse/YARN-5548
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: oct16-easy, test
> Attachments: YARN-5548.0001.patch, YARN-5548.0002.patch, 
> YARN-5548.0003.patch, YARN-5548.0004.patch, YARN-5548.0005.patch, 
> YARN-5548.0006.patch, YARN-5548.0007.patch, YARN-5548.0008.patch, 
> YARN-5548.0009.patch, YARN-5548.0010.patch, YARN-5548.0011.patch, 
> YARN-5548.0012.patch, YARN-5548.0013.patch, YARN-5548.0014.patch, 
> YARN-5548.0015.patch, YARN-5548.0016.patch
>
>
> https://builds.apache.org/job/PreCommit-YARN-Build/12850/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> Error Message
> Stacktrace
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1471885197388 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 0 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1471885197417 
> application_state: RMAPP_FINISHED finish_time: 1471885197478>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6862) Nodemanager resource usage metrics sometimes are negative

2017-07-24 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-6862:
-
 Summary: Nodemanager resource usage metrics sometimes are negative 
 (was: There is a bug in computing resource usage in NM.)
Target Version/s: 2.8.2
   Fix Version/s: (was: 2.8.2)

I updated the summary to be something more specific.  Also please do not set 
the Fix version field, as that should only be set once a patch is committed to 
one or more branches.  The Target Version is intended to track the intended 
version(s) for the fix.

> Nodemanager resource usage metrics sometimes are negative
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6130) [ATSv2 Security] Generate a delegation token for AM when app collector is created and pass it to AM via NM and RM

2017-07-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098474#comment-16098474
 ] 

Varun Saxena commented on YARN-6130:


Alternatively, we can also compare the token on every allocate response. It 
would be only 50-60 bytes and would anyways be in a separate RM Allocation 
thread, no matter which AM we talk about.

> [ATSv2 Security] Generate a delegation token for AM when app collector is 
> created and pass it to AM via NM and RM
> -
>
> Key: YARN-6130
> URL: https://issues.apache.org/jira/browse/YARN-6130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6130-YARN-5355.01.patch, 
> YARN-6130-YARN-5355.02.patch, YARN-6130-YARN-5355.03.patch, 
> YARN-6130-YARN-5355.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6842) Implement a new access type for queue

2017-07-24 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098469#comment-16098469
 ] 

Bibin A Chundatt commented on YARN-6842:


[~daemon]
During application submission we could set {{ApplicationAccessType#VIEW_APP}} 
acl rights in containerLaunchContext .
Does that solve your usecase.?? 

> Implement a new access type for queue
> -
>
> Key: YARN-6842
> URL: https://issues.apache.org/jira/browse/YARN-6842
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
>Assignee: YunFan Zhou
> Fix For: 2.8.2
>
> Attachments: YARN-6842.001.patch, YARN-6842.002.patch, 
> YARN-6842.003.patch
>
>
> When we want to access applications of a queue,  only we can do is become the 
> administer of the queue at present.
> But sometimes we only want  authorize someone view applications of a queue 
> but not modify operation.
> In our current mechanism there isn't any way to meet it, so I will implement 
> a new access type for queue to solve
> this problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6130) [ATSv2 Security] Generate a delegation token for AM when app collector is created and pass it to AM via NM and RM

2017-07-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098275#comment-16098275
 ] 

Varun Saxena commented on YARN-6130:


Thanks [~jianhe] and [~rohithsharma] for reviews. Sorry could not reply earlier.

bq. The AllocateResponse#newInstance method may be not needed. I think if we 
have the Builder pattern, we don’t need to keep on adding newInstance methods 
anymore
Ok.

bq. Even without rmIdentifies, if token is updated with same rm_identifiers 
then AM has to update it right? Am I missing any particular scenario?
I was thinking of caching the rm id and version which came when token was last 
updated in MapReduce AM. So that we can match against it.
This was to get rid of unncessary adding of tokens in UGI if the said token has 
already been updated. If the token service already exists in tokenMap, which 
would be true just about everytime, while adding token, in 
Credentials#addToken, we iterate over all the available tokens.
This was a small optimization for that. Assuming AM may not have too many 
tokens so iterating over the token map may not be that costly though.
Thoughts?

> [ATSv2 Security] Generate a delegation token for AM when app collector is 
> created and pass it to AM via NM and RM
> -
>
> Key: YARN-6130
> URL: https://issues.apache.org/jira/browse/YARN-6130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6130-YARN-5355.01.patch, 
> YARN-6130-YARN-5355.02.patch, YARN-6130-YARN-5355.03.patch, 
> YARN-6130-YARN-5355.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6862) There is a bug in computing resource usage in NM.

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098249#comment-16098249
 ] 

Sunil G commented on YARN-6862:
---

[~daemon] Thanks for raising the jira.
Could you please share some more information regarding cluster and logs if any 
(NM logs).

> There is a bug in computing resource usage in NM.
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
> Fix For: 2.8.2
>
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover

2017-07-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098247#comment-16098247
 ] 

Rohith Sharma K S commented on YARN-6102:
-

test failures are unrelated to the patch.. there are other open JIRA exist.

> RMActiveService context to be updated with new RMContext on failover
> 
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, 
> YARN-6102.06.patch, YARN-6102.07.patch, YARN-6102-branch-2.001.patch, 
> YARN-6102-branch-2.002.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3254) HealthReport should include disk full information

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098234#comment-16098234
 ] 

Sunil G commented on YARN-3254:
---

Thanks [~suma.shivaprasad]
Generally patch looks fine. Few minor comments.

# I think  {{DirectoryCollection#getErroredDirs}} is a public api. Could u 
please mark as evolving and some more information in api. I think existing java 
doc is not that great there. However its better we do that.
# {{fullLocalDirsList}} could be renamed as {{diskFullLocalDirsList}} or some 
better name :) similar for logdirs as well.


> HealthReport should include disk full information
> -
>
> Key: YARN-3254
> URL: https://issues.apache.org/jira/browse/YARN-3254
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Akira Ajisaka
>Assignee: Suma Shivaprasad
> Fix For: 3.0.0-beta1
>
> Attachments: Screen Shot 2015-02-24 at 17.57.39.png, Screen Shot 
> 2015-02-25 at 14.38.10.png, YARN-3254-001.patch, YARN-3254-002.patch, 
> YARN-3254-003.patch
>
>
> When a NodeManager's local disk gets almost full, the NodeManager sends a 
> health report to ResourceManager that "local/log dir is bad" and the message 
> is displayed on ResourceManager Web UI. It's difficult for users to detect 
> why the dir is bad.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6788) Improve performance of resource profile branch

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098211#comment-16098211
 ] 

Hadoop QA commented on YARN-6788:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-3926 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
44s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
16s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 0s{color} | {color:green} YARN-3926 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
14s{color} | {color:green} YARN-3926 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
25s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api in 
YARN-3926 has 1 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
34s{color} | {color:green} YARN-3926 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  0s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 23 new + 126 unchanged - 16 fixed = 149 total (was 142) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
37s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
generated 4 new + 0 unchanged - 1 fixed = 4 total (was 1) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
40s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
59s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 46m 45s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}113m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
|  |  Possible null pointer dereference of a on branch that might be infeasible 
in org.apache.hadoop.yarn.api.records.Resource.equals(Object)  Dereferenced at 
Resource.java:a on branch that might be infeasible in 
org.apache.hadoop.yarn.api.records.Resource.equals(Object)  Dereferenced at 
Resource.java:[line 358] |
|  |  org.apache.hadoop.yarn.api.records.impl.BaseResource.getResources() may 
expose internal representation by returning BaseResource.resources  At 
BaseResource.java:by returning BaseResource.resources  At 
BaseResource.java:[line 131] |
|  |  Public static 
org.apache.hadoop.yarn.util.resource.ResourceUtils.getResourceNamesArray() may 

[jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098205#comment-16098205
 ] 

Hadoop QA commented on YARN-6102:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
48s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 137 unchanged - 9 fixed = 137 total (was 146) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}107m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation |
| JDK v1.7.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5e40efe |
| JIRA Issue | YARN-6102 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12878594/YARN-6102-branch-2.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname 

[jira] [Commented] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098188#comment-16098188
 ] 

Sunil G commented on YARN-5219:
---

Thanks [~suma.shivaprasad]

I think you meant "set -o pipefail -e". It makes sense to me. Could you please 
confirm.

> When an export var command fails in launch_container.sh, the full container 
> launch should fail
> --
>
> Key: YARN-5219
> URL: https://issues.apache.org/jira/browse/YARN-5219
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha1
>Reporter: Hitesh Shah
>Assignee: Sunil G
> Attachments: YARN-5219.001.patch, YARN-5219.003.patch, 
> YARN-5219.004.patch, YARN-5219.005.patch, YARN-5219.006.patch, 
> YARN-5219-branch-2.001.patch
>
>
> Today, a container fails if certain files fail to localize. However, if 
> certain env vars fail to get setup properly either due to bugs in the yarn 
> application or misconfiguration, the actual process launch still gets 
> triggered. This results in either confusing error messages if the process 
> fails to launch or worse yet the process launches but then starts behaving 
> wrongly if the env var is used to control some behavioral aspects. 
> In this scenario, the issue was reproduced by trying to do export 
> abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6862) There is a bug in computing resource usage in NM.

2017-07-24 Thread YunFan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YunFan Zhou updated YARN-6862:
--
Description: 
When we collect real-time metrics of resource usage in NM, we found those 
values sometimes are invalid.
For example, the following are values when collected at some point:

"milliVcoresUsed":-5808,
"currentPmemUsage":-1,
"currentVmemUsage":-1,
"cpuUsagePercentPerCore":-968.1026
"cpuUsageTotalCoresPercentage":-24.202564,
"pmemLimit":2147483648,
"vmemLimit":4509715456

There are many negative values,  there may a bug in NM. 
We should fix it, because the real-time metrics of NM is pretty important for 
us sometimes.

> There is a bug in computing resource usage in NM.
> -
>
> Key: YARN-6862
> URL: https://issues.apache.org/jira/browse/YARN-6862
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.8.2
>Reporter: YunFan Zhou
> Fix For: 2.8.2
>
>
> When we collect real-time metrics of resource usage in NM, we found those 
> values sometimes are invalid.
> For example, the following are values when collected at some point:
> "milliVcoresUsed":-5808,
> "currentPmemUsage":-1,
> "currentVmemUsage":-1,
> "cpuUsagePercentPerCore":-968.1026
> "cpuUsageTotalCoresPercentage":-24.202564,
> "pmemLimit":2147483648,
> "vmemLimit":4509715456
> There are many negative values,  there may a bug in NM. 
> We should fix it, because the real-time metrics of NM is pretty important for 
> us sometimes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6862) There is a bug in computing resource usage in NM.

2017-07-24 Thread YunFan Zhou (JIRA)
YunFan Zhou created YARN-6862:
-

 Summary: There is a bug in computing resource usage in NM.
 Key: YARN-6862
 URL: https://issues.apache.org/jira/browse/YARN-6862
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.8.2
Reporter: YunFan Zhou
 Fix For: 2.8.2






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5892) Support user-specific minimum user limit percentage in Capacity Scheduler

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098177#comment-16098177
 ] 

Sunil G commented on YARN-5892:
---

Hi [~eepayne]
Thank you very much for the effort. Generally patch looks fine for me except 
below doubt
In {{ActiveUsersManager}}, could we avoid *activeUsersChanged* if possible. May 
be we could keep an active set in ActiveUsersManager itself and we could clear 
this set when activate/deactivateApplication is invoked.


> Support user-specific minimum user limit percentage in Capacity Scheduler
> -
>
> Key: YARN-5892
> URL: https://issues.apache.org/jira/browse/YARN-5892
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.0.0-alpha3
>
> Attachments: Active users highlighted.jpg, YARN-5892.001.patch, 
> YARN-5892.002.patch, YARN-5892.003.patch, YARN-5892.004.patch, 
> YARN-5892.005.patch, YARN-5892.006.patch, YARN-5892.007.patch, 
> YARN-5892.008.patch, YARN-5892.009.patch, YARN-5892.010.patch, 
> YARN-5892.012.patch, YARN-5892.013.patch, YARN-5892.014.patch, 
> YARN-5892.015.patch, YARN-5892.branch-2.015.patch, 
> YARN-5892.branch-2.016.patch, YARN-5892.branch-2.8.016.patch, 
> YARN-5892.branch-2.8.017.patch, YARN-5892.branch-2.8.018.patch
>
>
> Currently, in the capacity scheduler, the {{minimum-user-limit-percent}} 
> property is per queue. A cluster admin should be able to set the minimum user 
> limit percent on a per-user basis within the queue.
> This functionality is needed so that when intra-queue preemption is enabled 
> (YARN-4945 / YARN-2113), some users can be deemed as more important than 
> other users, and resources from VIP users won't be as likely to be preempted.
> For example, if the {{getstuffdone}} queue has a MULP of 25 percent, but user 
> {{jane}} is a power user of queue {{getstuffdone}} and needs to be guaranteed 
> 75 percent, the properties for {{getstuffdone}} and {{jane}} would look like 
> this:
> {code}
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.minimum-user-limit-percent
> 25
>   
>   
> 
> yarn.scheduler.capacity.root.getstuffdone.jane.minimum-user-limit-percent
> 75
>   
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6734) Ensure sub-application user is extracted & sent to timeline service

2017-07-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098174#comment-16098174
 ] 

Rohith Sharma K S commented on YARN-6734:
-

[~varun_saxena] do you have any further comments on the patch? 

> Ensure sub-application user is extracted & sent to timeline service
> ---
>
> Key: YARN-6734
> URL: https://issues.apache.org/jira/browse/YARN-6734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
> Attachments: YARN-6734-YARN-5355.001.patch
>
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information. YARN-6733
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> YARN-6733 tracks the code changes needed for  table schema creation.
> This jira tracks writing to that table, updating the user name fields to 
> include sub-application user etc. This would mean adding a field to Flow 
> Context which can store an additional user 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6861) Reader API for sub application entities

2017-07-24 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-6861:
---

 Summary: Reader API for sub application entities
 Key: YARN-6861
 URL: https://issues.apache.org/jira/browse/YARN-6861
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelinereader
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


YARN-6733 and YARN-6734 writes data into sub application table. There should be 
a way to read those entities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved YARN-6860.
-
Resolution: Duplicate

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: YARN-6860.01.patch
>
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098161#comment-16098161
 ] 

Akira Ajisaka commented on YARN-6860:
-

I looked YARN-5548 and probably it will fix this failure. Closing this as dup. 
Thanks [~rohithsharma] and [~varun_saxena].

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: YARN-6860.01.patch
>
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated YARN-6860:

Attachment: YARN-6860.01.patch

Attaching a patch to use GenericTestUtils.waitFor.

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Attachments: YARN-6860.01.patch
>
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098159#comment-16098159
 ] 

Hadoop QA commented on YARN-5548:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-5548 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5548 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12862014/YARN-5548.0015.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16531/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Use MockRMMemoryStateStore to reduce test failures
> --
>
> Key: YARN-5548
> URL: https://issues.apache.org/jira/browse/YARN-5548
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: oct16-easy, test
> Attachments: YARN-5548.0001.patch, YARN-5548.0002.patch, 
> YARN-5548.0003.patch, YARN-5548.0004.patch, YARN-5548.0005.patch, 
> YARN-5548.0006.patch, YARN-5548.0007.patch, YARN-5548.0008.patch, 
> YARN-5548.0009.patch, YARN-5548.0010.patch, YARN-5548.0011.patch, 
> YARN-5548.0012.patch, YARN-5548.0013.patch, YARN-5548.0014.patch, 
> YARN-5548.0015.patch
>
>
> https://builds.apache.org/job/PreCommit-YARN-Build/12850/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> Error Message
> Stacktrace
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1471885197388 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 0 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1471885197417 
> application_state: RMAPP_FINISHED finish_time: 1471885197478>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5548) Use MockRMMemoryStateStore to reduce test failures

2017-07-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098147#comment-16098147
 ] 

Varun Saxena commented on YARN-5548:


Bibin can you rebase the patch?
Sorry for missing this.

> Use MockRMMemoryStateStore to reduce test failures
> --
>
> Key: YARN-5548
> URL: https://issues.apache.org/jira/browse/YARN-5548
> Project: Hadoop YARN
>  Issue Type: Test
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>  Labels: oct16-easy, test
> Attachments: YARN-5548.0001.patch, YARN-5548.0002.patch, 
> YARN-5548.0003.patch, YARN-5548.0004.patch, YARN-5548.0005.patch, 
> YARN-5548.0006.patch, YARN-5548.0007.patch, YARN-5548.0008.patch, 
> YARN-5548.0009.patch, YARN-5548.0010.patch, YARN-5548.0011.patch, 
> YARN-5548.0012.patch, YARN-5548.0013.patch, YARN-5548.0014.patch, 
> YARN-5548.0015.patch
>
>
> https://builds.apache.org/job/PreCommit-YARN-Build/12850/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> Error Message
> Stacktrace
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1471885197388 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 0 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1471885197417 
> application_state: RMAPP_FINISHED finish_time: 1471885197478>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1656)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098145#comment-16098145
 ] 

Varun Saxena commented on YARN-6860:


Sorry I had to get in YARN-5548.
Missed it. Will get it in by today.

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098142#comment-16098142
 ] 

Akira Ajisaka commented on YARN-6860:
-

Okay, I'll check YARN-5548.

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098141#comment-16098141
 ] 

Akira Ajisaka commented on YARN-6860:
-

The test fails in the following code:
{code}
// the first app0 get kicked out from both rmContext and state store
Assert.assertNull(rm2.getRMContext().getRMApps()
  .get(app0.getApplicationId()));
Assert.assertNull(rmAppState.get(app0.getApplicationId()));
{code}
RMAppManager removes app0 from rmContext by blocking API, and removes it from 
state store by non-blocking API (Please see {{RMStateStore#removeApplication}} 
for the detail). That way the latter assertion may fail. I'm thinking the issue 
can be fixed by adding wait via {{GenericTestUtils#waitFor}}. I'll attach a 
patch shortly.

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098140#comment-16098140
 ] 

Rohith Sharma K S commented on YARN-6860:
-

There is JIRA exist with for this test case failure i.e YARN-5548.

> TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently
> ---
>
> Key: YARN-6860
> URL: https://issues.apache.org/jira/browse/YARN-6860
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>
> https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
> {noformat}
> java.lang.AssertionError: expected null, but was: application_submission_context { application_id { id: 1 cluster_timestamp: 
> 1500886835515 } application_name: "" queue: "default" priority { priority: 0 
> } am_container_spec { } cancel_tokens_when_complete: true maxAppAttempts: 2 
> resource { memory: 1024 virtual_cores: 1 } applicationType: "YARN" 
> keep_containers_across_application_attempts: false 
> attempt_failures_validity_interval: 0 am_container_resource_request { 
> priority { priority: 0 } resource_name: "*" capability { memory: 1024 
> virtual_cores: 1 } num_containers: 1 relax_locality: true 
> node_label_expression: "" execution_type_request { execution_type: GUARANTEED 
> enforce_execution_type: false } } } user: "jenkins" start_time: 1500886835535 
> application_state: RMAPP_FINISHED finish_time: 1500886835559>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotNull(Assert.java:664)
>   at org.junit.Assert.assertNull(Assert.java:646)
>   at org.junit.Assert.assertNull(Assert.java:656)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6130) [ATSv2 Security] Generate a delegation token for AM when app collector is created and pass it to AM via NM and RM

2017-07-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098138#comment-16098138
 ] 

Rohith Sharma K S commented on YARN-6130:
-

bq. The intention here is to avoid updating the token in AM UGI on every 
allocate response. We can potentially cache the RMID and version to ensure that 
the version of token coming from RM is same as the one already updated by AM in 
its UGI. Thoughts?
Even without rmIdentifies, if token is updated with same rm_identifiers  then 
AM has to update it right? Am I missing any particular scenario?

[~jianhe]
bq. what is the existing AppCollectorData#rmIdentifier used for ?
this is used to handle race condition between 2 NM sending collector address to 
RM. Lets say, because of split brain one NM is out of sync and application is 
relaunched in different NodeManager. After NM is reconnected from split brain, 
both the NMs will keep sending collector data to RM and updates wrong collector 
address in RM which intern AM will update wrong collector address. 

> [ATSv2 Security] Generate a delegation token for AM when app collector is 
> created and pass it to AM via NM and RM
> -
>
> Key: YARN-6130
> URL: https://issues.apache.org/jira/browse/YARN-6130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-6130-YARN-5355.01.patch, 
> YARN-6130-YARN-5355.02.patch, YARN-6130-YARN-5355.03.patch, 
> YARN-6130-YARN-5355.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6860) TestRMRestart.testFinishedAppRemovalAfterRMRestart fails intermittently

2017-07-24 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created YARN-6860:
---

 Summary: TestRMRestart.testFinishedAppRemovalAfterRMRestart fails 
intermittently
 Key: YARN-6860
 URL: https://issues.apache.org/jira/browse/YARN-6860
 Project: Hadoop YARN
  Issue Type: Bug
  Components: test
Reporter: Akira Ajisaka
Assignee: Akira Ajisaka


https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMRestart/testFinishedAppRemovalAfterRMRestart/
{noformat}
java.lang.AssertionError: expected null, but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotNull(Assert.java:664)
at org.junit.Assert.assertNull(Assert.java:646)
at org.junit.Assert.assertNull(Assert.java:656)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testFinishedAppRemovalAfterRMRestart(TestRMRestart.java:1673)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4161) Capacity Scheduler : Assign single or multiple containers per heart beat driven by configuration

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098133#comment-16098133
 ] 

Sunil G commented on YARN-4161:
---

One doubt:
I think we are not checking below condition in {{canAllocateMore}}. I might 
have lost context, please help to correct me if I am wrong.

{code}
  if (assignment.getAssignmentInformation().getNumReservations() == 0) {

return true;
  }
{code}

> Capacity Scheduler : Assign single or multiple containers per heart beat 
> driven by configuration
> 
>
> Key: YARN-4161
> URL: https://issues.apache.org/jira/browse/YARN-4161
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>  Labels: oct16-medium
> Attachments: YARN-4161.002.patch, YARN-4161.003.patch, 
> YARN-4161.004.patch, YARN-4161.005.patch, YARN-4161.patch, YARN-4161.patch.1
>
>
> Capacity Scheduler right now schedules multiple containers per heart beat if 
> there are more resources available in the node.
> This approach works fine however in some cases its not distribute the load 
> across the cluster hence throughput of the cluster suffers. I am adding 
> feature to drive that using configuration by that we can control the number 
> of containers assigned per heart beat.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6240) TestCapacityScheduler.testRefreshQueuesWithQueueDelete fails randomly

2017-07-24 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098119#comment-16098119
 ] 

Rohith Sharma K S commented on YARN-6240:
-

Recently it failed in branch-2 
[build|https://builds.apache.org/job/PreCommit-YARN-Build/16527/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.7.0_131.txt]
[~Naganarasimha] do you have any updates for fixing this  JIRA? 

> TestCapacityScheduler.testRefreshQueuesWithQueueDelete fails randomly
> -
>
> Key: YARN-6240
> URL: https://issues.apache.org/jira/browse/YARN-6240
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Sunil G
>Assignee: Naganarasimha G R
> Attachments: YARN-6240.001.patch
>
>
> *Error Message*
> Expected to NOT throw exception when refresh queue tries to delete a queue 
> WITHOUT running apps
> Link 
> [here|https://builds.apache.org/job/PreCommit-YARN-Build/15092/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacityScheduler/testRefreshQueuesWithQueueDelete/]
> *Stacktrace*
> {code}
> java.lang.AssertionError: Expected to NOT throw exception when refresh queue 
> tries to delete a queue WITHOUT running apps
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler.testRefreshQueuesWithQueueDelete(TestCapacityScheduler.java:3875)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-07-24 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098112#comment-16098112
 ] 

Sunil G commented on YARN-6678:
---

Thanks [~Tao Yang], 
I will commit the patch in a day if there are no objections.

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch, YARN-6678.002.patch, 
> YARN-6678.003.patch, YARN-6678.004.patch, YARN-6678.005.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
> 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container.
> We should confirm that reserved container on this node is equal to re-reserve 
> container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6102) RMActiveService context to be updated with new RMContext on failover

2017-07-24 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6102:

Attachment: YARN-6102-branch-2.002.patch

updated branch-2 patch fixing findbugs. Test failures are unrelated to patch 
and many open JIRAs are exist. 

> RMActiveService context to be updated with new RMContext on failover
> 
>
> Key: YARN-6102
> URL: https://issues.apache.org/jira/browse/YARN-6102
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.7.3
>Reporter: Ajith S
>Assignee: Rohith Sharma K S
>Priority: Critical
> Attachments: eventOrder.JPG, YARN-6102.01.patch, YARN-6102.02.patch, 
> YARN-6102.03.patch, YARN-6102.04.patch, YARN-6102.05.patch, 
> YARN-6102.06.patch, YARN-6102.07.patch, YARN-6102-branch-2.001.patch, 
> YARN-6102-branch-2.002.patch
>
>
> {code}2017-01-17 16:42:17,911 FATAL [AsyncDispatcher event handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:dispatch(200)) - Error in 
> dispatcher thread
> java.lang.Exception: No handler for registered for class 
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:196)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:120)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-17 16:42:17,914 INFO  [AsyncDispatcher ShutDown handler] 
> event.AsyncDispatcher (AsyncDispatcher.java:run(303)) - Exiting, bbye..{code}
> The same stack i was also noticed in {{TestResourceTrackerOnHA}} exits 
> abnormally, after some analysis, i was able to reproduce.
> Once the nodeHeartBeat is sent to RM, inside 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.nodeHeartbeat(NodeHeartbeatRequest)}},
>  before sending it to dispatcher through
> {{this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);}} 
> if RM failover is called, the dispatcher is reset
> The new dispatcher is however first started and then the events are 
> registered at 
> {{org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(boolean)}}
> So event order will look like
> 1. Send Node heartbeat to {{ResourceTrackerService}}
> 2. In {{ResourceTrackerService.nodeHeartbeat}}, before passing to dispatcher 
> call RM failover
> 3. In RM Failover, current active will reset dispatcher @reinitialize i.e ( 
> {{resetDispatcher();}} + {{createAndInitActiveServices();}} )
> Now between {{resetDispatcher();}} and {{createAndInitActiveServices();}} , 
> the {{ResourceTrackerService.nodeHeartbeat}} invokes dipatcher
> This will cause the above error as at point of time when {{STATUS_UPDATE}} 
> event is given to dispatcher in {{ResourceTrackerService}} , the new 
> dispatcher(from the failover) may be started but not yet registered for events
> Using same steps(with pausing JVM at debug), i was able to reproduce this in 
> production cluster also. for {{STATUS_UPDATE}} active service event, when the 
> service is yet to forward the event to RM dispatcher but a failover is called 
> and dispatcher reset is between {{resetDispatcher();}} & 
> {{createAndInitActiveServices();}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6788) Improve performance of resource profile branch

2017-07-24 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-6788:
--
Attachment: YARN-6788-YARN-3926.011.patch

Thanks [~leftnoteasy].
Uploading a patch addressing latest comments.

bq.Additional items (performance related)
Yes, I think these figures makes sense. I ran a jmeter analysis test on all 
apis from Resources class. And there is some performance dip, however there 
will be some cost associated with extra resource objects. Barring same, I was 
planning to add these test case in a new patch (with some compile time flag to 
trigger performance tests).

bq.Additional items (non-performance related)
Sure. I ll be handling these in another patch.

> Improve performance of resource profile branch
> --
>
> Key: YARN-6788
> URL: https://issues.apache.org/jira/browse/YARN-6788
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Blocker
> Attachments: YARN-6788-YARN-3926.001.patch, 
> YARN-6788-YARN-3926.002.patch, YARN-6788-YARN-3926.003.patch, 
> YARN-6788-YARN-3926.004.patch, YARN-6788-YARN-3926.005.patch, 
> YARN-6788-YARN-3926.006.patch, YARN-6788-YARN-3926.007.patch, 
> YARN-6788-YARN-3926.008.patch, YARN-6788-YARN-3926.009.patch, 
> YARN-6788-YARN-3926.010.patch, YARN-6788-YARN-3926.011.patch
>
>
> Currently we could see a 15% performance delta with this branch. 
> Few performance improvements to improve the same.
> Also this patch will handle 
> [comments|https://issues.apache.org/jira/browse/YARN-6761?focusedCommentId=16075418=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16075418]
>  from [~leftnoteasy].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6859) Add missing test scope to the zookeeper dependency in hadoop-yarn-server-resourcemanager test-jar

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098103#comment-16098103
 ] 

Hadoop QA commented on YARN-6859:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 43m 45s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6859 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12878583/add_test_scope.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  xml  |
| uname | Linux fa77db9524f5 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 770cc46 |
| Default Java | 1.8.0_131 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16528/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16528/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16528/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Add missing test scope to the zookeeper dependency in 
> hadoop-yarn-server-resourcemanager test-jar
> -
>
> Key: YARN-6859
> URL: https://issues.apache.org/jira/browse/YARN-6859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Minor
> Attachments: add_test_scope.patch
>
>
> Reported by Sean Mackrory 

[jira] [Commented] (YARN-6102) RMActiveService context to be updated with new RMContext on failover

2017-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098087#comment-16098087
 ] 

Hadoop QA commented on YARN-6102:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
23s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
52s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 137 unchanged - 9 fixed = 137 total (was 146) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
33s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 18s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Unused field:ResourceManager.java |
| JDK v1.8.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
| JDK v1.7.0_131 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5e40efe |
| JIRA Issue | YARN-6102 |
| JIRA 

[jira] [Assigned] (YARN-6859) Add missing test scope to the zookeeper dependency in hadoop-yarn-server-resourcemanager test-jar

2017-07-24 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned YARN-6859:
---

Assignee: Akira Ajisaka

> Add missing test scope to the zookeeper dependency in 
> hadoop-yarn-server-resourcemanager test-jar
> -
>
> Key: YARN-6859
> URL: https://issues.apache.org/jira/browse/YARN-6859
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Minor
> Attachments: add_test_scope.patch
>
>
> Reported by Sean Mackrory in [common-dev 
> ML|http://markmail.org/message/3wvcdwcoyas2255f]. When compiling Apache 
> Hadoop with {{-Dzookeeper.version=3.5.3-beta}}, the build fails by the 
> following error:
> {noformat}
> [WARNING] 
> Dependency convergence error for org.apache.zookeeper:zookeeper:3.5.3-beta 
> paths to dependency are:
> +-org.apache.hadoop:hadoop-yarn-server-tests:3.0.0-beta1-SNAPSHOT
>   +-org.apache.hadoop:hadoop-common:3.0.0-beta1-SNAPSHOT
> +-org.apache.zookeeper:zookeeper:3.5.3-beta
> ...
> and
> +-org.apache.hadoop:hadoop-yarn-server-tests:3.0.0-beta1-SNAPSHOT
>   +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:3.0.0-beta1-SNAPSHOT
> +-org.apache.zookeeper:zookeeper:3.5.3-beta
> and
> +-org.apache.hadoop:hadoop-yarn-server-tests:3.0.0-beta1-SNAPSHOT
>   +-org.apache.hadoop:hadoop-yarn-server-resourcemanager:3.0.0-beta1-SNAPSHOT
> +-org.apache.zookeeper:zookeeper:3.4.9
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >