[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564779#comment-16564779
 ] 

Hudson commented on YARN-8522:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14684 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14684/])
YARN-8522. Application fails with InvalidResourceRequestException. (Zian 
(wangda: rev 5cc8e99147276a059979813f7fd323dd7d77b248)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ApplicationSubmissionContextPBImpl.java


> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at 

[jira] [Updated] (YARN-8155) Improve ATSv2 client logging in RM and NM publisher

2018-07-31 Thread Abhishek Modi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Modi updated YARN-8155:

Attachment: YARN-8155-branch-2.v3.patch

> Improve ATSv2 client logging in RM and NM publisher
> ---
>
> Key: YARN-8155
> URL: https://issues.apache.org/jira/browse/YARN-8155
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rohith Sharma K S
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.4
>
> Attachments: YARN-8155-branch-2.002.patch, 
> YARN-8155-branch-2.v1.patch, YARN-8155-branch-2.v3.patch, 
> YARN-8155.001.patch, YARN-8155.002.patch, YARN-8155.003.patch, 
> YARN-8155.004.patch, YARN-8155.005.patch, YARN-8155.006.patch
>
>
> We see that NM logs are filled with larger stack trace of NotFoundException 
> if collector is removed from one of the NM and other NMs are still publishing 
> the entities.
>  
> This Jira is to improve the logging in NM so that we log with informative 
> message.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564778#comment-16564778
 ] 

Sunil Govindan commented on YARN-8606:
--

Looks good to me as well. Committing shortly.

> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564775#comment-16564775
 ] 

Wangda Tan commented on YARN-8606:
--

[~bibinchundatt], 

Gotcha, fix make sense to me. +1 to the patch. I think the failed UT is not 
related. 

> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564737#comment-16564737
 ] 

Bibin A Chundatt edited comment on YARN-8606 at 8/1/18 5:22 AM:


[~suma.shivaprasad]

{quote}
 EventDispatcher for OpportunisticContainerAllocatorAMService when 
opportunistic scheduling is enabled in RMActiveServices composite service
{quote}
Coud you recheck, Its getting added to ResourceManager composite Service. 
{{createApplicationMasterService()}} of ResourceManager class not the inner 
RMActiveServices so service gets added to Resourcemanager CompositeService.

[~leftnoteasy]

Not regression but blocking issue as per functionality


was (Author: bibinchundatt):
[~suma.shivaprasad]

{quote}
 EventDispatcher for OpportunisticContainerAllocatorAMService when 
opportunistic scheduling is enabled in RMActiveServices composite service
{quote}
Coud you recheck, Its getting added to ResourceManager composite Service

[~leftnoteasy]

Not regression but blocking issue as per functionality

> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8607) Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap

2018-07-31 Thread Yeliang Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeliang Cang updated YARN-8607:
---
Attachment: YARN-8607.002.patch

> Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap
> -
>
> Key: YARN-8607
> URL: https://issues.apache.org/jira/browse/YARN-8607
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Yeliang Cang
>Assignee: Yeliang Cang
>Priority: Trivial
> Attachments: YARN-8607.001.patch, YARN-8607.002.patch
>
>
> In ApplicationAttemptStateData.java
> the annotation of getResourceSecondsMap is not correct:
> {code}
> /**
>  * Get the aggregated number of resources preempted that the application has
>  * allocated times the number of seconds the application has been running.
>  *
>  * @return map containing the resource name and aggregated preempted
>  * resource-seconds
>  */
> @Public
> @Unstable
> public abstract Map getResourceSecondsMap();
> {code}
> Should be
> {code}
> /**
>  * Get the aggregated number of resources that the application has
>  * allocated times the number of seconds the application has been running.
>  *
>  * @return map containing the resource name and aggregated preempted
>  * resource-seconds
>  */
> @Public
> @Unstable
> public abstract Map getResourceSecondsMap();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564737#comment-16564737
 ] 

Bibin A Chundatt commented on YARN-8606:


[~suma.shivaprasad]

{quote}
 EventDispatcher for OpportunisticContainerAllocatorAMService when 
opportunistic scheduling is enabled in RMActiveServices composite service
{quote}
Coud you recheck, Its getting added to ResourceManager composite Service

[~leftnoteasy]

Not regression but blocking issue as per functionality

> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8607) Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564730#comment-16564730
 ] 

genericqa commented on YARN-8607:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 69m 
29s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8607 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933843/YARN-8607.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 3147a8b78f5f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 40f9b0c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21457/testReport/ |
| Max. process+thread count | 922 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21457/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT 

[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-31 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564705#comment-16564705
 ] 

Weiwei Yang commented on YARN-8559:
---

Attached v4 patch, 1) added document; 2) added ACL check; I have built the doc 
and verified locally, also tried submit request with a non-admin user, both 
works fine. [~leftnoteasy], [~suma.shivaprasad], please take a look. Thanks!

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch, YARN-8559.004.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-31 Thread Weiwei Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8559:
--
Attachment: YARN-8559.004.patch

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch, YARN-8559.004.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8593) Add new RM web service endpoint to get cluster user info

2018-07-31 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564675#comment-16564675
 ] 

Sunil Govindan commented on YARN-8593:
--

Thanks [~rohithsharma] for the details. You are correct. In a client mode with 
given proxy models exists today, only RM will be able to correctly process the 
request user info. Hence this end point is much useful.

If there are no objections, i ll help to commit the same today.

> Add new RM web service endpoint to get cluster user info
> 
>
> Key: YARN-8593
> URL: https://issues.apache.org/jira/browse/YARN-8593
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Major
> Attachments: YARN-8593.001.patch, YARN-8593.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8397) Potential thread leak in ActivitiesManager

2018-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564667#comment-16564667
 ] 

Hudson commented on YARN-8397:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14682 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14682/])
YARN-8397. Potential thread leak in ActivitiesManager. Contributed by (sunilg: 
rev 6310c0d17d6422a595f856a55b4f1fb82be43739)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/activities/ActivitiesManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> Potential thread leak in ActivitiesManager
> --
>
> Key: YARN-8397
> URL: https://issues.apache.org/jira/browse/YARN-8397
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8397.01.patch
>
>
> It is observed while using MiniYARNCluster, MiniYARNCluster#stop doesn't stop 
> JVM. 
> Thread dump shows that ActivitiesManager is in timed_waiting state. 
> {code}
> "Thread-43" #66 prio=5 os_prio=31 tid=0x7ffea09fd000 nid=0xa103 waiting 
> on condition [0x76f1]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager$1.run(ActivitiesManager.java:142)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8397) Potential thread leak in ActivitiesManager

2018-07-31 Thread Sunil Govindan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil Govindan updated YARN-8397:
-
Summary: Potential thread leak in ActivitiesManager  (was: Thread leak in 
ActivitiesManager)

> Potential thread leak in ActivitiesManager
> --
>
> Key: YARN-8397
> URL: https://issues.apache.org/jira/browse/YARN-8397
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8397.01.patch
>
>
> It is observed while using MiniYARNCluster, MiniYARNCluster#stop doesn't stop 
> JVM. 
> Thread dump shows that ActivitiesManager is in timed_waiting state. 
> {code}
> "Thread-43" #66 prio=5 os_prio=31 tid=0x7ffea09fd000 nid=0xa103 waiting 
> on condition [0x76f1]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
>   at java.lang.Thread.sleep(Native Method)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.activities.ActivitiesManager$1.run(ActivitiesManager.java:142)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-31 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564662#comment-16564662
 ] 

Weiwei Yang commented on YARN-8559:
---

Hi [~suma.shivaprasad]/[~leftnoteasy]

Thanks for the comments.

bq. Can you pls add documentation in ResourceManagerRest.md

Sure, will update in next patch.

bq. can we use the existing logic similar to ConfServlet?

I think we should stick to the JAX-RS pattern in RMWebService, that is more 
consistent. Also it is easier to support both standard json/xml content type 
without parsing the request. After all DAO is just a wrapper, non-functional 
change.

bq. it might be better to protect access of the REST endpoint by admin

Make sense to me, will update in next patch.

Thanks

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8607) Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap

2018-07-31 Thread Yeliang Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeliang Cang updated YARN-8607:
---
Attachment: YARN-8607.001.patch

> Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap
> -
>
> Key: YARN-8607
> URL: https://issues.apache.org/jira/browse/YARN-8607
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Yeliang Cang
>Assignee: Yeliang Cang
>Priority: Trivial
> Attachments: YARN-8607.001.patch
>
>
> In ApplicationAttemptStateData.java
> the annotation of getResourceSecondsMap is not correct:
> {code}
> /**
>  * Get the aggregated number of resources preempted that the application has
>  * allocated times the number of seconds the application has been running.
>  *
>  * @return map containing the resource name and aggregated preempted
>  * resource-seconds
>  */
> @Public
> @Unstable
> public abstract Map getResourceSecondsMap();
> {code}
> Should be
> {code}
> /**
>  * Get the aggregated number of resources that the application has
>  * allocated times the number of seconds the application has been running.
>  *
>  * @return map containing the resource name and aggregated preempted
>  * resource-seconds
>  */
> @Public
> @Unstable
> public abstract Map getResourceSecondsMap();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8607) Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap

2018-07-31 Thread Yeliang Cang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeliang Cang updated YARN-8607:
---
Labels: docuentation  (was: )

> Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap
> -
>
> Key: YARN-8607
> URL: https://issues.apache.org/jira/browse/YARN-8607
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Yeliang Cang
>Assignee: Yeliang Cang
>Priority: Trivial
>  Labels: docuentation
>
> In ApplicationAttemptStateData.java
> the annotation of getResourceSecondsMap is not correct:
> {code}
> /**
>  * Get the aggregated number of resources preempted that the application has
>  * allocated times the number of seconds the application has been running.
>  *
>  * @return map containing the resource name and aggregated preempted
>  * resource-seconds
>  */
> @Public
> @Unstable
> public abstract Map getResourceSecondsMap();
> {code}
> Should be
> {code}
> /**
>  * Get the aggregated number of resources that the application has
>  * allocated times the number of seconds the application has been running.
>  *
>  * @return map containing the resource name and aggregated preempted
>  * resource-seconds
>  */
> @Public
> @Unstable
> public abstract Map getResourceSecondsMap();
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8607) Incorrect annotation in ApplicationAttemptStateData#getResourceSecondsMap

2018-07-31 Thread Yeliang Cang (JIRA)
Yeliang Cang created YARN-8607:
--

 Summary: Incorrect annotation in 
ApplicationAttemptStateData#getResourceSecondsMap
 Key: YARN-8607
 URL: https://issues.apache.org/jira/browse/YARN-8607
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Yeliang Cang
Assignee: Yeliang Cang


In ApplicationAttemptStateData.java

the annotation of getResourceSecondsMap is not correct:

{code}

/**
 * Get the aggregated number of resources preempted that the application has
 * allocated times the number of seconds the application has been running.
 *
 * @return map containing the resource name and aggregated preempted
 * resource-seconds
 */
@Public
@Unstable
public abstract Map getResourceSecondsMap();

{code}

Should be

{code}

/**
 * Get the aggregated number of resources that the application has
 * allocated times the number of seconds the application has been running.
 *
 * @return map containing the resource name and aggregated preempted
 * resource-seconds
 */
@Public
@Unstable
public abstract Map getResourceSecondsMap();

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling

2018-07-31 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564638#comment-16564638
 ] 

Tao Yang commented on YARN-8546:


Thanks [~leftnoteasy]!

> Resource leak caused by a reserved container being released more than once 
> under async scheduling
> -
>
> Key: YARN-8546
> URL: https://issues.apache.org/jira/browse/YARN-8546
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: global-scheduling
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8546.001.patch
>
>
> I was able to reproduce this issue by starting a job, and this job keeps 
> requesting containers until it uses up cluster available resource. My cluster 
> has 70200 vcores, and each task it applies for 100 vcores, I was expecting 
> total 702 containers can be allocated but eventually there was only 701. The 
> last container could not get allocated because queue used resource is updated 
> to be more than 100%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564589#comment-16564589
 ] 

Wangda Tan commented on YARN-7494:
--

[~sunilg], 

Thanks for updating the patch, some comments.

1) Not used:

  public static final String RESOURCE_USAGE_BASED_NODE_SORTING_POLICY = 
"resource-usage";

2) Renames for CapacitySchedulerConfiguration: 

- DEFAULT_NODE_SORTING_POLICY_NAME => DEFAULT_NODE_SORTING_POLICY_CLASSNAME
- For the usage, MULTI_NODE_SORTING_POLICIES/MULTI_NODE_SORTING_POLICY_NAME 
could be confusion to end users. Is it better to have same prefix, such as: 
multi-node-sorting.policy.classes = "a,b,c"
multi-node-sorting.policy.a.class = "org.apache..."
"classes" will be a disallowed policy name. 

3) CS#getCandidateNodeSet copies the full list everytime, is there any plan to 
improve this in the future? 

4) DefaultMultiNodeLookupPolicy: Should we call it 
ResourceUsageMultiNodeLookupPolicy (or LeastResourceUsedPreferred...?

5) It's better to add getMultiNodeSortingPolicyName CSQueue, inside 
FiCaSchedulerApp, you can check the instanceof. With this we can limit the 
changes to CS. 

6) RegularContainerAllocator#allocate, when the change needed?

7) Instead of adding MultiNodeSortingManager to RMContext, can we limit changes 
inside Scheduler?


> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.008.patch, 
> YARN-7494.009.patch, YARN-7494.010.patch, YARN-7494.11.patch, 
> YARN-7494.v0.patch, YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564555#comment-16564555
 ] 

genericqa commented on YARN-7833:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 31m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
13s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 30m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
36s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
4s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  0s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
42s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 48s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
49s{color} | {color:green} hadoop-yarn-server-router in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 43s{color} 
| {color:red} hadoop-sls in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  1m 
 6s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m 13s{color} | 

[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-07-31 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441414#comment-16441414
 ] 

Chandni Singh edited comment on YARN-8160 at 8/1/18 12:29 AM:
--

We do need to build a new CLC because upgrade lets user modify the 
artifacts/env/configs of their service. Likely scenario is that the version of 
the component (docker based) changed. I haven't tested with a docker based app, 
but I think this re-init should work seamlessly in the docker case. If it 
doesn't then that would be a bug.

I don't clearly understand [~eyang]'s comments on the improvement that is being 
proposed. 



was (Author: csingh):
I looked at the {{ContainerImpl}} code and I think re-init already uses the 
re-launch logic.
As [~shaneku...@gmail.com] pointed out, re-init first will deactivate the 
existing container then do a relaunch with a new {{ContainerLaunchContext}}. 

We do need to build a new CLC because upgrade lets user modify the 
artifacts/env/configs of their service. Likely scenario is that the version of 
the component (docker based) changed. I haven't tested with a docker based app, 
but I think this re-init should work seamlessly in the docker case. If it 
doesn't then that would be a bug.

I don't clearly understand [~eyang]'s comments on the improvement that is being 
proposed. 


> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564538#comment-16564538
 ] 

Haibo Chen commented on YARN-8468:
--

Chatted with [~wilfreds] on how the scheduler can picks up the queue-specific 
values. We can basically do this when the resource requests are normalized (See 
AbstractYarnScheduler. normalizeResourceRequests())

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564512#comment-16564512
 ] 

Haibo Chen commented on YARN-6966:
--

[~snemeth] The unit tests are related. Seems that the new unit test does not 
clean up properly, causing following tests to fail.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564474#comment-16564474
 ] 

genericqa commented on YARN-6966:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 
34s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-2 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 4 new + 189 unchanged - 3 fixed = 193 total (was 192) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 15m 24s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 59s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.TestContainerManagerRecovery |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:a716388 |
| JIRA Issue | YARN-6966 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933626/YARN-6966-branch-2.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 425b686bdc93 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / a716388 |
| maven | version: Apache Maven 3.3.9 
(bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-10T16:41:47+00:00) |
| Default Java | 1.7.0_181 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21455/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21455/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21455/testReport/ |
| Max. process+thread count | 160 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 

[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564456#comment-16564456
 ] 

Hudson commented on YARN-8579:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14679 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14679/])
YARN-8579.  Recover NMToken of previous attempted component data.
(eyang: rev c7ebcd76bf3dd14127336951f2be3de772e7826a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java


> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch, YARN-8579.004.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7974) Allow updating application tracking url after registration

2018-07-31 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564451#comment-16564451
 ] 

Jonathan Hung commented on YARN-7974:
-

Attached branch-2.001 patch. Main incompatibility was in RMAppAttemptImpl when 
specifying resource values when constructing app attempt state data.

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-7974-branch-2.001.patch, YARN-7974.001.patch, 
> YARN-7974.002.patch, YARN-7974.003.patch, YARN-7974.004.patch, 
> YARN-7974.005.patch, YARN-7974.006.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Approach is for AM to update tracking url on heartbeat to RM, and add related 
> API in AMRMClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7974) Allow updating application tracking url after registration

2018-07-31 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-7974:

Attachment: YARN-7974-branch-2.001.patch

> Allow updating application tracking url after registration
> --
>
> Key: YARN-7974
> URL: https://issues.apache.org/jira/browse/YARN-7974
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-7974-branch-2.001.patch, YARN-7974.001.patch, 
> YARN-7974.002.patch, YARN-7974.003.patch, YARN-7974.004.patch, 
> YARN-7974.005.patch, YARN-7974.006.patch
>
>
> Normally an application's tracking url is set on AM registration. We have a 
> use case for updating the tracking url after registration (e.g. the UI is 
> hosted on one of the containers).
> Approach is for AM to update tracking url on heartbeat to RM, and add related 
> API in AMRMClient.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564437#comment-16564437
 ] 

Wangda Tan commented on YARN-8522:
--

LGTM +1, thanks [~Zian Chen], Will commit shortly, we may not need add tests 
for this.

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Critical
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (YARN-8522) Application fails with InvalidResourceRequestException

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8522:
-
Target Version/s: 3.2.0, 3.1.1
Priority: Critical  (was: Major)

> Application fails with InvalidResourceRequestException
> --
>
> Key: YARN-8522
> URL: https://issues.apache.org/jira/browse/YARN-8522
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Zian Chen
>Priority: Critical
> Attachments: YARN-8522.001.patch, YARN-8522.002.patch
>
>
> Launch multiple streaming app simultaneously. Here, sometimes one of the 
> application fails with below stack trace.
> {code}
> 18/07/02 07:14:32 INFO retry.RetryInvocationHandler: 
> java.net.ConnectException: Call From xx.xx.xx.xx/xx.xx.xx.xx to 
> xx.xx.xx.xx:8032 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
> ApplicationClientProtocolPBClientImpl.submitApplication over null. Retrying 
> after sleeping for 3ms.
> 18/07/02 07:14:32 WARN client.RequestHedgingRMFailoverProxyProvider: 
> Invocation returned exception: 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
>  on [rm2], so propagating back to caller.
> 18/07/02 07:14:32 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /user/hrt_qa/.staging/job_1530515284077_0007
> 18/07/02 07:14:32 ERROR streaming.StreamJob: Error Launching job : 
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, only one resource request with * is allowed
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:502)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:320)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:645)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564424#comment-16564424
 ] 

genericqa commented on YARN-8553:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
17s{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common 
generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
15s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 71m 
44s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8553 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933813/YARN-8553.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dce722cfbaec 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4b540bb |
| maven | version: Apache Maven 3.3.9 |
| Default 

[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564421#comment-16564421
 ] 

Haibo Chen commented on YARN-6966:
--

Retriggerred a Jenkins job now that HADOOP-15644 is fixed.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564416#comment-16564416
 ] 

Haibo Chen commented on YARN-8468:
--

Thanks [~bsteinbach] for the patch and the detailed response to Wilfred's 
comments.  I have some comments to the latest patch

1) I'd add to Wilfred's previous comment about this change being a 
queue-specific override of the scheduler-level configuration. The current patch 
would throw an exception if queue specific configuration value is larger than 
the scheduler-level value. Instead of doing that, for any queue, the final 
value can be either the scheduler-level value if there is no queue-specific 
override, or componentwise minimum of the scheduler-level value and the queue 
level override.

2) The scheduler-level configuration is used to normalize resource requests and 
from then on, the scheduler will take the normalized requests and handle things 
correctly. I don't see how the scheduler picks up the queue-specific values. We 
can verify this by setting up a unit test in TestFairScheduler that submits a 
resource request to a queue of which the queue-level max container allocation 
is smaller than the resource request.  

2) This is a queue configuration, not so much a queue metric, so I don't think 
we need to expose them in FSQueueMetrics.

3) Let's rename 'maxContainerResources' to 'maxContainerAllocation' () to be 
consistent with the naming of the existing scheduler-level configuration 
property

Please also make sure lines are not over 80 characters and use 4 spaces as 
indentation, just to be consistent with the code base.

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-07-31 Thread Tanuj Nayak (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Nayak updated YARN-7833:
--
Attachment: YARN-7833.v6.patch

> [PERF/TEST] Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-7833.v1.patch, YARN-7833.v2.patch, 
> YARN-7833.v3.patch, YARN-7833.v4.patch, YARN-7833.v5.patch, YARN-7833.v6.patch
>
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-07-31 Thread Tanuj Nayak (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Nayak updated YARN-7833:
--
Attachment: (was: YARN-7833.v6.patch)

> [PERF/TEST] Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-7833.v1.patch, YARN-7833.v2.patch, 
> YARN-7833.v3.patch, YARN-7833.v4.patch, YARN-7833.v5.patch
>
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-07-31 Thread Tanuj Nayak (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Nayak updated YARN-7833:
--
Attachment: YARN-7833.v6.patch

> [PERF/TEST] Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-7833.v1.patch, YARN-7833.v2.patch, 
> YARN-7833.v3.patch, YARN-7833.v4.patch, YARN-7833.v5.patch, YARN-7833.v6.patch
>
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8600) RegistryDNS hang when remote lookup does not reply

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8600:
-
Priority: Critical  (was: Major)

> RegistryDNS hang when remote lookup does not reply
> --
>
> Key: YARN-8600
> URL: https://issues.apache.org/jira/browse/YARN-8600
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Critical
> Attachments: YARN-8600.001.patch, YARN-8600.002.patch
>
>
> If lookup type mismatch with the record to query, remote DNS server might not 
> reply.  For example looking up a CNAME record with a PTR address: 
> 1.76.27.172.in-addr.arpa.  This can hang registryDNS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8579:
-
Target Version/s: 3.2.0, 3.1.2
   Fix Version/s: (was: 3.1.2)
  (was: 3.2.0)

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch, YARN-8579.004.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8518) test-container-executor test_is_empty() is broken

2018-07-31 Thread Robert Kanter (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-8518:

Fix Version/s: 3.0.3
   2.9.2
   3.1.1

> test-container-executor test_is_empty() is broken
> -
>
> Key: YARN-8518
> URL: https://issues.apache.org/jira/browse/YARN-8518
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 2.9.2, 3.0.3
>
> Attachments: YARN-8518.001.patch
>
>
> A new test was recently added to test-container-executor.c that has some 
> problems.
> It is attempting to mkdir() a hard-coded path: 
> /tmp/2938rf2983hcqnw8ud/emptydir
> This fails because the base directory is not there.  These directories are 
> not being cleaned up either.
> It should be using TEST_ROOT.
> I don't know what Jira this change was made under - the git commit from July 
> 9 2018 does not reference a Jira.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564326#comment-16564326
 ] 

genericqa commented on YARN-8509:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 51s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m  2s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}127m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8509 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933667/YARN-8509.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 647fcdab5e3a 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 9fea5c9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21450/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21450/testReport/ |
| Max. process+thread count | 951 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564309#comment-16564309
 ] 

genericqa commented on YARN-8392:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
24s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 63m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8392 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933807/YARN-8392.4.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8df0f6e4fb2e 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 8aa93a5 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21451/testReport/ |
| Max. process+thread count | 743 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21451/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Allow multiple tags for 

[jira] [Updated] (YARN-7512) Support service upgrade via YARN Service API and CLI

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-7512:
-
Target Version/s: 3.1.2  (was: 3.1.1)

> Support service upgrade via YARN Service API and CLI
> 
>
> Key: YARN-7512
> URL: https://issues.apache.org/jira/browse/YARN-7512
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Gour Saha
>Assignee: Chandni Singh
>Priority: Major
> Fix For: yarn-native-services
>
> Attachments: _In-Place Upgrade of Long-Running Applications in 
> YARN_v1.pdf, _In-Place Upgrade of Long-Running Applications in YARN_v2.pdf, 
> _In-Place Upgrade of Long-Running Applications in YARN_v3.pdf
>
>
> YARN Service API and CLI needs to support service (and containers) upgrade in 
> line with what Slider supported in SLIDER-787 
> (http://slider.incubator.apache.org/docs/slider_specs/application_pkg_upgrade.html)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8399) NodeManager is giving 403 GSS exception post upgrade to 3.1 in secure mode

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8399:
-
Target Version/s: 2.10.0, 3.2.0, 3.0.3, 3.1.2  (was: 2.10.0, 3.2.0, 3.1.1, 
3.0.3)

> NodeManager is giving 403 GSS exception post upgrade to 3.1 in secure mode
> --
>
> Key: YARN-8399
> URL: https://issues.apache.org/jira/browse/YARN-8399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8399.001.patch, YARN-8399.002.patch, 
> YARN-8399.003.patch
>
>
> Getting 403 GSS exception while accessing NM http port via curl. 
> {code:java}
> curl -k -i --negotiate -u: https://:/node
> HTTP/1.1 401 Authentication required
> Date: Tue, 05 Jun 2018 17:59:00 GMT
> Date: Tue, 05 Jun 2018 17:59:00 GMT
> Pragma: no-cache
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 264
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34)){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8520) Document best practice for user management

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8520:
-
Target Version/s: 3.2.0, 3.1.2  (was: 3.2.0, 3.1.1)

> Document best practice for user management
> --
>
> Key: YARN-8520
> URL: https://issues.apache.org/jira/browse/YARN-8520
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: documentation, yarn
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8520.001.patch
>
>
> Docker container must have consistent username and groups with host operating 
> system when external mount points are exposed to docker container.  This 
> prevents malicious or unauthorized impersonation to occur.  This task is to 
> document the best practice to ensure user and group membership are consistent 
> across docker containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8052) Move overwriting of service definition during flex to service master

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8052:
-
Target Version/s: 3.1.2  (was: 3.1.1)

> Move overwriting of service definition during flex to service master
> 
>
> Key: YARN-8052
> URL: https://issues.apache.org/jira/browse/YARN-8052
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> The overwrite of service definition during flex is done from the 
> ServiceClient. 
> During auto finalization of upgrade, the current service definition gets 
> overwritten as well by the service master. This creates a potential conflict. 
> Need to move the overwrite of service definition during flex to the 
> ServiceClient. 
> Discussed on YARN-8018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8136) Add version attribute to site doc examples and quickstart

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8136:
-
Target Version/s: 3.1.2  (was: 3.1.1)

> Add version attribute to site doc examples and quickstart
> -
>
> Key: YARN-8136
> URL: https://issues.apache.org/jira/browse/YARN-8136
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: site
>Reporter: Gour Saha
>Priority: Major
>
> version attribute is missing in the following 2 site doc files -
> src/site/markdown/yarn-service/Examples.md
> src/site/markdown/yarn-service/QuickStart.md



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8453) Additional Unit tests to verify queue limit and max-limit with multiple resource types

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8453:
-
Target Version/s: 3.0.4, 3.1.2  (was: 3.1.1, 3.0.4)

> Additional Unit  tests to verify queue limit and max-limit with multiple 
> resource types
> ---
>
> Key: YARN-8453
> URL: https://issues.apache.org/jira/browse/YARN-8453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.2
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Attachments: YARN-8453.001.patch
>
>
> Post support of additional resource types other then CPU and Memory, it could 
> be possible that one such new resource is exhausted its quota on a given 
> queue. But other resources such as Memory / CPU is still there beyond its 
> guaranteed limit (under max-limit). Adding more units test to ensure we are 
> not starving such allocation requests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8399) NodeManager is giving 403 GSS exception post upgrade to 3.1 in secure mode

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8399:
-
Target Version/s: 2.10.0, 3.2.0, 3.0.3  (was: 2.10.0, 3.2.0, 3.0.3, 3.1.2)

> NodeManager is giving 403 GSS exception post upgrade to 3.1 in secure mode
> --
>
> Key: YARN-8399
> URL: https://issues.apache.org/jira/browse/YARN-8399
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineservice
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8399.001.patch, YARN-8399.002.patch, 
> YARN-8399.003.patch
>
>
> Getting 403 GSS exception while accessing NM http port via curl. 
> {code:java}
> curl -k -i --negotiate -u: https://:/node
> HTTP/1.1 401 Authentication required
> Date: Tue, 05 Jun 2018 17:59:00 GMT
> Date: Tue, 05 Jun 2018 17:59:00 GMT
> Pragma: no-cache
> WWW-Authenticate: Negotiate
> Set-Cookie: hadoop.auth=; Path=/; Secure; HttpOnly
> Cache-Control: must-revalidate,no-cache,no-store
> Content-Type: text/html;charset=iso-8859-1
> Content-Length: 264
> HTTP/1.1 403 GSSException: Failure unspecified at GSS-API level (Mechanism 
> level: Request is a replay (34)){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8161) ServiceState FLEX should be removed

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8161:
-
Target Version/s: 3.2.0, 3.1.2  (was: 3.2.0, 3.1.1)

> ServiceState FLEX should be removed
> ---
>
> Key: YARN-8161
> URL: https://issues.apache.org/jira/browse/YARN-8161
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Gour Saha
>Priority: Major
>
> ServiceState FLEX is not required to trigger flex up/down of containers and 
> should be removed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8366) Expose debug log information when user intend to enable GPU without setting nvidia-smi path

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8366:
-
Target Version/s: 3.2.0, 3.1.2  (was: 3.2.0, 3.1.1)

> Expose debug log information when user intend to enable GPU without setting 
> nvidia-smi path
> ---
>
> Key: YARN-8366
> URL: https://issues.apache.org/jira/browse/YARN-8366
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>
> Expose Debug information help user found the root cause of failure when user 
> don't make these two settings manually before enabling GPU on YARN
> 1. yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables in 
> yarn-site.xml
> 2. environment variable LD_LIBRARY_PATH



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8552) [DS] Container report fails for distributed containers

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8552:
-
Target Version/s: 3.1.2  (was: 3.1.1)

> [DS]  Container report fails for distributed containers
> ---
>
> Key: YARN-8552
> URL: https://issues.apache.org/jira/browse/YARN-8552
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
>
> 2018-07-19 19:15:02,281 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
> container_1531994217928_0003_01_1099511627753 Container Transitioned from 
> ACQUIRED to RUNNING
> 2018-07-19 19:15:02,384 ERROR 
> org.apache.hadoop.yarn.server.webapp.ContainerBlock: Failed to read the 
> container container_1531994217928_0003_01_1099511627773.
> Container report failing for Distributed Scheduler containers. Currently all 
> the container are fetched from central RM so need to find alternative for the 
> same.
> {code}
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.yarn.exceptions.ContainerNotFoundException: 
> Container with id 'container_1531994217928_0003_01_1099511627773' doesn't 
> exist in RM.
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:499)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMContainerBlock.getContainerReport(RMContainerBlock.java:44)
> at 
> org.apache.hadoop.yarn.server.webapp.ContainerBlock$1.run(ContainerBlock.java:82)
> at 
> org.apache.hadoop.yarn.server.webapp.ContainerBlock$1.run(ContainerBlock.java:79)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
> ... 70 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564304#comment-16564304
 ] 

genericqa commented on YARN-7833:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  8s{color} 
| {color:red} YARN-7833 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7833 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933812/YARN-7833.v5.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21452/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [PERF/TEST] Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-7833.v1.patch, YARN-7833.v2.patch, 
> YARN-7833.v3.patch, YARN-7833.v4.patch, YARN-7833.v5.patch
>
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-31 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-8553:
-
Attachment: YARN-8553.001.patch

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch, YARN-8553.001.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-31 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8468:
-
Component/s: (was: resourcemanager)
 fairscheduler

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-07-31 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8468:
-
Labels:   (was: patch)

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-07-31 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564301#comment-16564301
 ] 

Szilard Nemeth commented on YARN-8553:
--

Re-uploading patch, somehow yetus did not recognize the initial patch yet.

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch, YARN-8553.001.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7833) [PERF/TEST] Extend SLS to support simulation of a Federated Environment

2018-07-31 Thread Tanuj Nayak (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Nayak updated YARN-7833:
--
Attachment: YARN-7833.v5.patch

> [PERF/TEST] Extend SLS to support simulation of a Federated Environment
> ---
>
> Key: YARN-7833
> URL: https://issues.apache.org/jira/browse/YARN-7833
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Tanuj Nayak
>Priority: Major
> Attachments: YARN-7833.v1.patch, YARN-7833.v2.patch, 
> YARN-7833.v3.patch, YARN-7833.v4.patch, YARN-7833.v5.patch
>
>
> To develop algorithms for federation, it would be of great help to have a 
> version of SLS that supports multi RMs and GPG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification

2018-07-31 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564275#comment-16564275
 ] 

Gour Saha commented on YARN-8392:
-

Thanks [~billie.rinaldi]. Patch 4 looks good. The documentation in swagger 
definition (YARN-Simplified-V1-API-Layer-For-Services.yaml), examples 
(YARN-Services-Examples.md) and site documentation are quite generic since it 
talks about the broader placement policy support. However, do you want to 
review them once and see if we should add some specific examples for this 
symmetric usecase.

> Allow multiple tags for anti-affinity placement policy in service 
> specification
> ---
>
> Key: YARN-8392
> URL: https://issues.apache.org/jira/browse/YARN-8392
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch, 
> YARN-8392.4.patch
>
>
> Currently the service client code is restricting a component's target tags to 
> include only a single tag, the component name. I have a use case for two 
> components having anti-affinity with themselves and with each other. The YARN 
> placement policies support this, but the service framework isn't allowing it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8418) App local logs could leaked if log aggregation fails to initialize for the app

2018-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564273#comment-16564273
 ] 

Hudson commented on YARN-8418:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #14678 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14678/])
YARN-8418. App local logs could leaked if log aggregation fails to (wangda: rev 
4b540bbfcf02d828052999215c6135603d98f5db)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/NonAggregatingLogHandler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/LogHandler.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/DummyContainerManager.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/event/LogHandlerTokenUpdatedEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/loghandler/event/LogHandlerEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/logaggregation/filecontroller/LogAggregationFileController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java


> App local logs could leaked if log aggregation fails to initialize for the app
> --
>
> Key: YARN-8418
> URL: https://issues.apache.org/jira/browse/YARN-8418
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8418.001.patch, YARN-8418.002.patch, 
> YARN-8418.003.patch, YARN-8418.004.patch, YARN-8418.005.patch, 
> YARN-8418.006.patch, YARN-8418.007.patch, YARN-8418.008.patch, 
> YARN-8418.009.patch
>
>
> If log aggregation fails init createApp directory container logs could get 
> leaked in NM directory
> For log running application restart of NM after token renewal this case is 
> possible/  Application submission with invalid delegation token



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-07-31 Thread Craig Condit (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564271#comment-16564271
 ] 

Craig Condit commented on YARN-8263:


I can dig into that further. Will submit a new patch.

> DockerClient still touches hadoop.tmp.dir
> -
>
> Key: YARN-8263
> URL: https://issues.apache.org/jira/browse/YARN-8263
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Jason Lowe
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8263.001.patch
>
>
> The DockerClient constructor fails if hadoop.tmp.dir is not set and proceeds 
> to create a directory there.  After YARN-8064 there's no longer a need to 
> touch the temporary directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564255#comment-16564255
 ] 

Wangda Tan commented on YARN-8301:
--

Committed to branch-3.1.1, thanks [~csingh]/[~eyang].

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch, YARN-8301.005.patch, 
> YARN-8301.006.patch, YARN-8301.007.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8508) On NodeManager container gets cleaned up before its pid file is created

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8508:
-
Fix Version/s: (was: 3.1.2)
   3.1.1

> On NodeManager container gets cleaned up before its pid file is created
> ---
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8505.001.patch, YARN-8505.002.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   

[jira] [Commented] (YARN-8508) On NodeManager container gets cleaned up before its pid file is created

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564256#comment-16564256
 ] 

Wangda Tan commented on YARN-8508:
--

Committed to branch-3.1.1, thanks [~csingh]!

> On NodeManager container gets cleaned up before its pid file is created
> ---
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8505.001.patch, YARN-8505.002.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> 

[jira] [Commented] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564254#comment-16564254
 ] 

Wangda Tan commented on YARN-8546:
--

Committed to branch-3.1.1, thanks [~Tao Yang]/[~cheersyang]

> Resource leak caused by a reserved container being released more than once 
> under async scheduling
> -
>
> Key: YARN-8546
> URL: https://issues.apache.org/jira/browse/YARN-8546
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: global-scheduling
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8546.001.patch
>
>
> I was able to reproduce this issue by starting a job, and this job keeps 
> requesting containers until it uses up cluster available resource. My cluster 
> has 70200 vcores, and each task it applies for 100 vcores, I was expecting 
> total 702 containers can be allocated but eventually there was only 701. The 
> last container could not get allocated because queue used resource is updated 
> to be more than 100%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8301:
-
Fix Version/s: (was: 3.1.2)
   3.1.1

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch, YARN-8301.005.patch, 
> YARN-8301.006.patch, YARN-8301.007.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8528) Final states in ContainerAllocation might be modified externally causing unexpected allocation results

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564253#comment-16564253
 ] 

Wangda Tan commented on YARN-8528:
--

Committed to branch-3.1.1, thanks [~cheersyang]

> Final states in ContainerAllocation might be modified externally causing 
> unexpected allocation results
> --
>
> Key: YARN-8528
> URL: https://issues.apache.org/jira/browse/YARN-8528
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8528.001.patch
>
>
> ContainerAllocation.LOCALITY_SKIPPED is final static, and its .state should 
> always be AllocationState.LOCALITY_SKIPPED. However, this variable is public 
> and is accidentally changed to AllocationState.APP_SKIPPED in 
> RegularContainerAllocator under certain conditions. Once that happens, all 
> following LOCALITY_SKIPPED situations will be treated as APP_SKIPPED.
> Similar risks exist for 
> ContainerAllocation.PRIORITY_SKIPPED/APP_SKIPPED/QUEUE_SKIPPED. 
> ContainerAllocation.state should be private and should not be changed. If 
> changes are needed, a new ContainerAllocation should be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8546) Resource leak caused by a reserved container being released more than once under async scheduling

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8546:
-
Fix Version/s: (was: 3.1.2)
   3.1.1

> Resource leak caused by a reserved container being released more than once 
> under async scheduling
> -
>
> Key: YARN-8546
> URL: https://issues.apache.org/jira/browse/YARN-8546
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Affects Versions: 3.1.0
>Reporter: Weiwei Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: global-scheduling
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8546.001.patch
>
>
> I was able to reproduce this issue by starting a job, and this job keeps 
> requesting containers until it uses up cluster available resource. My cluster 
> has 70200 vcores, and each task it applies for 100 vcores, I was expecting 
> total 702 containers can be allocated but eventually there was only 701. The 
> last container could not get allocated because queue used resource is updated 
> to be more than 100%.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8528) Final states in ContainerAllocation might be modified externally causing unexpected allocation results

2018-07-31 Thread Wangda Tan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8528:
-
Fix Version/s: (was: 3.1.2)
   3.1.1

> Final states in ContainerAllocation might be modified externally causing 
> unexpected allocation results
> --
>
> Key: YARN-8528
> URL: https://issues.apache.org/jira/browse/YARN-8528
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Xintong Song
>Assignee: Xintong Song
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8528.001.patch
>
>
> ContainerAllocation.LOCALITY_SKIPPED is final static, and its .state should 
> always be AllocationState.LOCALITY_SKIPPED. However, this variable is public 
> and is accidentally changed to AllocationState.APP_SKIPPED in 
> RegularContainerAllocator under certain conditions. Once that happens, all 
> following LOCALITY_SKIPPED situations will be treated as APP_SKIPPED.
> Similar risks exist for 
> ContainerAllocation.PRIORITY_SKIPPED/APP_SKIPPED/QUEUE_SKIPPED. 
> ContainerAllocation.state should be private and should not be changed. If 
> changes are needed, a new ContainerAllocation should be created.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-31 Thread Gour Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564250#comment-16564250
 ] 

Gour Saha commented on YARN-8579:
-

Thanks [~csingh]. [~eyang] please review and commit when you get a chance.

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch, YARN-8579.004.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564213#comment-16564213
 ] 

Wangda Tan commented on YARN-8559:
--

Thanks [~cheersyang], 

Some suggestions:

1) Instead of creating a new DAO, can we use the existing logic similar to 
org.apache.hadoop.conf.ConfServlet? 

2) Also, it might be better to protect access of the REST endpoint by admin 
only since it includes some sensitive information like ACL, etc. For sensitive 
information of scheduler, user can access /scheduler. 

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8605) TestDominantResourceFairnessPolicy.testModWhileSorting is flaky

2018-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564205#comment-16564205
 ] 

Hudson commented on YARN-8605:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14677 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14677/])
YARN-8605. TestDominantResourceFairnessPolicy.testModWhileSorting is 
(haibochen: rev 8aa93a575e896c609b97ddab58853b1eb95f0dee)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/TestDominantResourceFairnessPolicy.java


> TestDominantResourceFairnessPolicy.testModWhileSorting is flaky
> ---
>
> Key: YARN-8605
> URL: https://issues.apache.org/jira/browse/YARN-8605
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8605.01.patch
>
>
> TestDominantResourceFairnessPolicy.testModWhileSorting: the test for the old 
> comparison method is flaky.
> Testing relies on the sorting to have started when the modification starts 
> and that seems to be to tricky to time.
> Introduced with YARN-8436



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification

2018-07-31 Thread Billie Rinaldi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564200#comment-16564200
 ] 

Billie Rinaldi commented on YARN-8392:
--

Thanks for the review, [~gsaha]! Patch 4 improves the error message.

> Allow multiple tags for anti-affinity placement policy in service 
> specification
> ---
>
> Key: YARN-8392
> URL: https://issues.apache.org/jira/browse/YARN-8392
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch, 
> YARN-8392.4.patch
>
>
> Currently the service client code is restricting a component's target tags to 
> include only a single tag, the component name. I have a use case for two 
> components having anti-affinity with themselves and with each other. The YARN 
> placement policies support this, but the service framework isn't allowing it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8392) Allow multiple tags for anti-affinity placement policy in service specification

2018-07-31 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-8392:
-
Attachment: YARN-8392.4.patch

> Allow multiple tags for anti-affinity placement policy in service 
> specification
> ---
>
> Key: YARN-8392
> URL: https://issues.apache.org/jira/browse/YARN-8392
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
>Priority: Critical
> Attachments: YARN-8392.1.patch, YARN-8392.2.patch, YARN-8392.3.patch, 
> YARN-8392.4.patch
>
>
> Currently the service client code is restricting a component's target tags to 
> include only a single tag, the component name. I have a use case for two 
> components having anti-affinity with themselves and with each other. The YARN 
> placement policies support this, but the service framework isn't allowing it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8605) TestDominantResourceFairnessPolicy.testModWhileSorting: flaky

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564146#comment-16564146
 ] 

Haibo Chen commented on YARN-8605:
--

+1 on the patch. Checking in shortly.

> TestDominantResourceFairnessPolicy.testModWhileSorting: flaky
> -
>
> Key: YARN-8605
> URL: https://issues.apache.org/jira/browse/YARN-8605
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-8605.01.patch
>
>
> TestDominantResourceFairnessPolicy.testModWhileSorting: the test for the old 
> comparison method is flaky.
> Testing relies on the sorting to have started when the modification starts 
> and that seems to be to tricky to time.
> Introduced with YARN-8436



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8605) TestDominantResourceFairnessPolicy.testModWhileSorting is flaky

2018-07-31 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8605:
-
Summary: TestDominantResourceFairnessPolicy.testModWhileSorting is flaky  
(was: TestDominantResourceFairnessPolicy.testModWhileSorting: flaky)

> TestDominantResourceFairnessPolicy.testModWhileSorting is flaky
> ---
>
> Key: YARN-8605
> URL: https://issues.apache.org/jira/browse/YARN-8605
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.2.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Attachments: YARN-8605.01.patch
>
>
> TestDominantResourceFairnessPolicy.testModWhileSorting: the test for the old 
> comparison method is flaky.
> Testing relies on the sorting to have started when the modification starts 
> and that seems to be to tricky to time.
> Introduced with YARN-8436



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-07-31 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564136#comment-16564136
 ] 

Jason Lowe commented on YARN-8263:
--

Thanks for the patch!

Patch looks good, but I think there's even more we can cleanup here which can 
be done in a separate JIRA if desired.  
DockerCommand#preparePrivilegedOperation has a conf argument for the sole 
purpose of constructing the DockerClient which is no longer necessary as part 
of this patch.  Even if it eventually needs one, it is available in the 
NMContext parameter.  That now unused conf argument should be removed.  
Transitively there are callers of preparePrivilegedOperation that are 
forwarding conf arguments that would no longer be needed, etc.


> DockerClient still touches hadoop.tmp.dir
> -
>
> Key: YARN-8263
> URL: https://issues.apache.org/jira/browse/YARN-8263
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Jason Lowe
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8263.001.patch
>
>
> The DockerClient constructor fails if hadoop.tmp.dir is not set and proceeds 
> to create a directory there.  After YARN-8064 there's no longer a need to 
> touch the temporary directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6966) NodeManager metrics may return wrong negative values when NM restart

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564137#comment-16564137
 ] 

Haibo Chen commented on YARN-6966:
--

I created HADOOP-15644 to fix the branch-2 docker build issue.

> NodeManager metrics may return wrong negative values when NM restart
> 
>
> Key: YARN-6966
> URL: https://issues.apache.org/jira/browse/YARN-6966
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yang Wang
>Assignee: Szilard Nemeth
>Priority: Major
> Fix For: 3.2.0, 3.0.4, 3.1.2
>
> Attachments: YARN-6966-branch-2.001.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-2.002.patch, 
> YARN-6966-branch-2.002.patch, YARN-6966-branch-3.0.0.001.patch, 
> YARN-6966-branch-3.0.001.patch, YARN-6966.001.patch, YARN-6966.002.patch, 
> YARN-6966.003.patch, YARN-6966.004.patch, YARN-6966.005.patch, 
> YARN-6966.005.patch, YARN-6966.006.patch
>
>
> Just as YARN-6212. However, I think it is not a duplicate of YARN-3933.
> The primary cause of negative values is that metrics do not recover properly 
> when NM restart.
> AllocatedContainers,ContainersLaunched,AllocatedGB,AvailableGB,AllocatedVCores,AvailableVCores
>  in metrics also need to recover when NM restart.
> This should be done in ContainerManagerImpl#recoverContainer.
> The scenario could be reproduction by the following steps:
> # Make sure 
> YarnConfiguration.NM_RECOVERY_ENABLED=true,YarnConfiguration.NM_RECOVERY_SUPERVISED=true
>  in NM
> # Submit an application and keep running
> # Restart NM
> # Stop the application
> # Now you get the negative values
> {code}
> /jmx?qry=Hadoop:service=NodeManager,name=NodeManagerMetrics
> {code}
> {code}
> {
> name: "Hadoop:service=NodeManager,name=NodeManagerMetrics",
> modelerType: "NodeManagerMetrics",
> tag.Context: "yarn",
> tag.Hostname: "hadoop.com",
> ContainersLaunched: 0,
> ContainersCompleted: 0,
> ContainersFailed: 2,
> ContainersKilled: 0,
> ContainersIniting: 0,
> ContainersRunning: 0,
> AllocatedGB: 0,
> AllocatedContainers: -2,
> AvailableGB: 160,
> AllocatedVCores: -11,
> AvailableVCores: 3611,
> ContainerLaunchDurationNumOps: 2,
> ContainerLaunchDurationAvgTime: 6,
> BadLocalDirs: 0,
> BadLogDirs: 0,
> GoodLocalDirsDiskUtilizationPerc: 2,
> GoodLogDirsDiskUtilizationPerc: 2
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564089#comment-16564089
 ] 

Wangda Tan commented on YARN-8606:
--

[~bibinchundatt], is this a regression recently? If yes, which JIRA breaks the 
use case?

> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

2018-07-31 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-4946:
-
Summary: RM should not consider an application as COMPLETED when log 
aggregation is not in a terminal state  (was: RM should not consider an 
application as COMPLETED when log aggregation is not in terminal state)

> RM should not consider an application as COMPLETED when log aggregation is 
> not in a terminal state
> --
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-4946.001.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in terminal state

2018-07-31 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-4946:
-
Attachment: YARN-4946.001.patch

> RM should not consider an application as COMPLETED when log aggregation is 
> not in terminal state
> 
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-4946.001.patch
>
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4946) RM should not consider an application as COMPLETED when log aggregation is not in terminal state

2018-07-31 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-4946:
-
Summary: RM should not consider an application as COMPLETED when log 
aggregation is not in terminal state  (was: RM should write out Aggregated Log 
Completion file flag next to logs)

> RM should not consider an application as COMPLETED when log aggregation is 
> not in terminal state
> 
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Szilard Nemeth
>Priority: Major
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7159) Normalize unit of resource objects in RM and avoid to do unit conversion in critical path

2018-07-31 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564065#comment-16564065
 ] 

Manikandan R commented on YARN-7159:


[~sunilg] Jenkins results are fine. Is this good enough to take it forward?

> Normalize unit of resource objects in RM and avoid to do unit conversion in 
> critical path
> -
>
> Key: YARN-7159
> URL: https://issues.apache.org/jira/browse/YARN-7159
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-7159.001.patch, YARN-7159.002.patch, 
> YARN-7159.003.patch, YARN-7159.004.patch, YARN-7159.005.patch, 
> YARN-7159.006.patch, YARN-7159.007.patch, YARN-7159.008.patch, 
> YARN-7159.009.patch, YARN-7159.010.patch, YARN-7159.011.patch, 
> YARN-7159.012.patch, YARN-7159.013.patch, YARN-7159.015.patch, 
> YARN-7159.016.patch, YARN-7159.017.patch, YARN-7159.018.patch, 
> YARN-7159.019.patch, YARN-7159.020.patch, YARN-7159.021.patch, 
> YARN-7159.022.patch, YARN-7159.023.patch
>
>
> Currently resource conversion could happen in critical code path when 
> different unit is specified by client. This could impact performance and 
> throughput of RM a lot. We should do unit normalization when resource passed 
> to RM and avoid expensive unit conversion every time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-31 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564062#comment-16564062
 ] 

Chandni Singh commented on YARN-8579:
-

+1 LGTM

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch, 
> YARN-8579.003.patch, YARN-8579.004.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4175) Example of use YARN-1197

2018-07-31 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564057#comment-16564057
 ] 

Manikandan R edited comment on YARN-4175 at 7/31/18 5:45 PM:
-

Thanks [~asuresh]. Updated earlier patch to do unit conversions for resource 
types based on units configured at server side if there is any difference in 
units between client side and server side (given units are not empty at both 
sides), rather than simply converting it to "Mi". For "memory" Mandatory 
resource, "Mi" would be assumed as unit if clients doesn't specify units and 
conversion would happen only if there is any difference between both values 
 (given units are not empty at both sides). For "vcores" Mandatory resource,  
assuming unit is going to be empty always, left the code as it is. Also added a 
test case to cover this case as well. Please review .004 patch.


was (Author: maniraj...@gmail.com):
Thanks [~asuresh]. Updated earlier patch to do unit conversions for resource 
types if there is any difference in units between client side and server side 
(given units are not empty at both sides), rather than simply converting it to 
"Mi". For "memory" Mandatory resource, "Mi" would be assumed as unit if clients 
doesn't specify units and conversion would happen only if there is any 
difference between both values 
(given units are not empty at both sides). For "vcores" Mandatory resource,  
assuming unit is going to be empty always, left the code as it is. Also added a 
test case to cover this case as well. Please review .004 patch.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
>Priority: Major
> Attachments: YARN-4175.003.patch, YARN-4175.004.patch, 
> YARN-4175.1.patch, YARN-4175.2.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4175) Example of use YARN-1197

2018-07-31 Thread Manikandan R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-4175:
---
Attachment: YARN-4175.004.patch

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
>Priority: Major
> Attachments: YARN-4175.003.patch, YARN-4175.004.patch, 
> YARN-4175.1.patch, YARN-4175.2.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4175) Example of use YARN-1197

2018-07-31 Thread Manikandan R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564057#comment-16564057
 ] 

Manikandan R commented on YARN-4175:


Thanks [~asuresh]. Updated earlier patch to do unit conversions for resource 
types if there is any difference in units between client side and server side 
(given units are not empty at both sides), rather than simply converting it to 
"Mi". For "memory" Mandatory resource, "Mi" would be assumed as unit if clients 
doesn't specify units and conversion would happen only if there is any 
difference between both values 
(given units are not empty at both sides). For "vcores" Mandatory resource,  
assuming unit is going to be empty always, left the code as it is. Also added a 
test case to cover this case as well. Please review .004 patch.

> Example of use YARN-1197
> 
>
> Key: YARN-4175
> URL: https://issues.apache.org/jira/browse/YARN-4175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: MENG DING
>Priority: Major
> Attachments: YARN-4175.003.patch, YARN-4175.1.patch, YARN-4175.2.patch
>
>
> Like YARN-2609, we need a example program to demonstrate how to use YARN-1197 
> from end-to-end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564054#comment-16564054
 ] 

Suma Shivaprasad commented on YARN-8606:


Thanks for the patch [~bibinchundatt]. createApplicationMasterService is 
already registering an EventDispatcher for 
OpportunisticContainerAllocatorAMService when opportunistic scheduling is 
enabled in RMActiveServices composite service. Can you pls explain what is the 
issue and what this patch intends to fix? 



> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7948) Enable refreshing maximum allocation for multiple resource types

2018-07-31 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564049#comment-16564049
 ] 

Haibo Chen commented on YARN-7948:
--

Thanks [~snemeth] for updating the patch. I have a few follow up comments.

For the custom ResourceTypeConfigurationProvider, I wonder if it is more 
readable to either set the properties directly with Configuration.set() 
(Capacity Scheduler unit test does this right) or even better, call 
Configuration.addResource(). We can get rid of the special parameter matching 
too. 

We don't do anything with resource manager in 
{{TestFairSchedulerWithMultiResourceTypes}}, so we can get rid of anything that 
is related to resource manager.  Given cpu and memory are default resource 
types, we should cover them too in unit tests.

There are a few unused imports in TestFairScheduler that need to cleaned up.

 

> Enable refreshing maximum allocation for multiple resource types
> 
>
> Key: YARN-7948
> URL: https://issues.apache.org/jira/browse/YARN-7948
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.0
>Reporter: Yufei Gu
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-7948.001.patch, YARN-7948.002.patch
>
>
> YARN-7738 did the same thing for CS. We need a fix for FS. We could fix it by 
> moving the refresh code from class CS to class AbstractYARNScheduler. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8559) Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint

2018-07-31 Thread Suma Shivaprasad (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564015#comment-16564015
 ] 

Suma Shivaprasad commented on YARN-8559:


Thanks [~cheersyang] Missed this in the earlier review - Can you pls add 
documentation in ResourceManagerRest.md for the same?

> Expose mutable-conf scheduler's configuration in RM /scheduler-conf endpoint
> 
>
> Key: YARN-8559
> URL: https://issues.apache.org/jira/browse/YARN-8559
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Anna Savarin
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8559.001.patch, YARN-8559.002.patch, 
> YARN-8559.003.patch
>
>
> All Hadoop services provide a set of common endpoints (/stacks, /logLevel, 
> /metrics, /jmx, /conf).  In the case of the Resource Manager, part of the 
> configuration comes from the scheduler being used.  Currently, these 
> configuration key/values are not exposed through the /conf endpoint, thereby 
> revealing an incomplete configuration picture. 
> Make an improvement and expose the scheduling configuration info through the 
> RM's /conf endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8175) Add support for Node Labels in SLS

2018-07-31 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564000#comment-16564000
 ] 

Hudson commented on YARN-8175:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14676 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14676/])
YARN-8175. Add support for Node Labels in SLS. Contributed by Abhishek 
(inigoiri: rev 9fea5c9ee76bd36f273ae93afef5f3ef3c477a53)
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/utils/SLSUtils.java
* (edit) hadoop-tools/hadoop-sls/src/test/resources/nodes-with-resources.json
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/SLSRunner.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/conf/SLSConfiguration.java
* (edit) 
hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/utils/TestSLSUtils.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/MRAMSimulator.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/AMSimulator.java
* (edit) 
hadoop-tools/hadoop-sls/src/test/java/org/apache/hadoop/yarn/sls/appmaster/TestAMSimulator.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/appmaster/StreamAMSimulator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/util/YarnClientUtils.java
* (edit) 
hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/nodemanager/NMSimulator.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java


> Add support for Node Labels in SLS
> --
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8175.001.patch, YARN-8175.002.patch, 
> YARN-8175.003.patch, YARN-8175.004.patch, YARN-8175.005.patch, 
> YARN-8175.006.patch, YARN-8175.007.patch, YARN-8175.008.patch, 
> YARN-8175.009.patch
>
>
> Currently, SLS doesn't support node labels. With this jira, we are planning 
> to add support for node labels in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8242) YARN NM: OOM error while reading back the state store on recovery

2018-07-31 Thread Jason Lowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned YARN-8242:


Assignee: Pradeep Ambati

Thanks for picking this up, [~pradeepambati]!

RecoveryIterator should extend Closeable so it can be used in 
try-with-resources blocks, simplifying the syntax to use it correctly.

For local resource recovery, using an iterator only for the 
LocalResourceTrackerState won't save very much.  The real savings will occur 
when we can iterate the list of resources within that tracker state so we don't 
have to hold the entire list of protobufs in memory while building the list of 
resources for that tracker.  In other words, LocalResourceTrackerState should 
contain a RecoveryIterator to recover the list of resources and inProgress 
resources.

RCSIterator#next should throw NoSuchElementException rather than an empty 
IOException when there is no next element.  It's a programming error to blindly 
call next() without knowing there's a next to have, and IOException isn't 
really appropriate here since this is not an issue with I/O.

The RCSIterator constructor should not have the leveldb iterator as an argument 
-- it should construct the iterator directly.  There's only one correct way to 
build that iterator, so it shouldn't be an argument to invite someone to pass 
it incorrectly. ;-)  This also applies to the RASIterator and RDSIterator 
constructors.

A lot of boilerplate between these recovery iterators could be reduced by 
refactoring the common code into a templated, abstract base class.  Really the 
only difference in each one is the "getNextItem" method.  All the other code is 
the same (with the use of generics).

Nit: RCSIterator would be more readable as ContainerStateIterator, e.g.: 
getContainerStateIterator instead of getRCSIterator.  Similar comments for the 
other acronym iterator classes.

Some of the comments on the previous version of the patch still apply:
- Debug log statements were dropped when recovering applications and 
containers.  Intentional?
- SIlently removing a container without a start request is probably not the 
correct way to handle that scenario.  If it's fixing a real bug it should be a 
separate JIRA.
- loadContainerState should initialize the containerId field rather than 
requiring the caller to do so. It's already passed as a parameter. That also 
precludes the need to have a setContainerId method.

Was there a reason the iterator approach was not used for recovering container 
or NM tokens?


> YARN NM: OOM error while reading back the state store on recovery
> -
>
> Key: YARN-8242
> URL: https://issues.apache.org/jira/browse/YARN-8242
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 2.6.0, 2.9.0, 2.6.5, 2.8.3, 3.1.0, 2.7.6, 3.0.2
>Reporter: Kanwaljeet Sachdev
>Assignee: Pradeep Ambati
>Priority: Critical
> Attachments: YARN-8242.001.patch, YARN-8242.002.patch, 
> YARN-8242.003.patch, YARN-8242.004.patch
>
>
> On startup the NM reads its state store and builds a list of application in 
> the state store to process. If the number of applications in the state store 
> is large and have a lot of "state" connected to it the NM can run OOM and 
> never get to the point that it can start processing the recovery.
> Since it never starts the recovery there is no way for the NM to ever pass 
> this point. It will require a change in heap size to get the NM started.
>  
> Following is the stack trace
> {code:java}
> at java.lang.OutOfMemoryError. (OutOfMemoryError.java:48) at 
> com.google.protobuf.ByteString.copyFrom (ByteString.java:192) at 
> com.google.protobuf.CodedInputStream.readBytes (CodedInputStream.java:324) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47069) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto. 
> (YarnProtos.java:47014) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47102) at 
> org.apache.hadoop.yarn.proto.YarnProtos$StringStringMapProto$1.parsePartialFrom
>  (YarnProtos.java:47097) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:41016) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto. 
> (YarnProtos.java:40942) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41080) at 
> org.apache.hadoop.yarn.proto.YarnProtos$ContainerLaunchContextProto$1.parsePartialFrom
>  (YarnProtos.java:41075) at com.google.protobuf.CodedInputStream.readMessage 
> (CodedInputStream.java:309) at 
> 

[jira] [Updated] (YARN-4946) RM should write out Aggregated Log Completion file flag next to logs

2018-07-31 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-4946:
-
Description: 
MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
Yarn App into a HAR file.  When run, it seeds the list by looking at the 
aggregated logs directory, and then filters out ineligible apps.  One of the 
criteria involves checking with the RM that an Application's log aggregation 
status is not still running and has not failed.  When the RM "forgets" about an 
older completed Application (e.g. RM failover, enough time has passed, etc), 
the tool won't find the Application in the RM and will just assume that its log 
aggregation succeeded, even if it actually failed or is still running.

We can solve this problem by doing the following:
The RM should not consider an app to be fully completed (and thus removed from 
its history) until the aggregation status has reached a terminal state (e.g. 
SUCCEEDED, FAILED, TIME_OUT).


  was:
MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
Yarn App into a HAR file.  When run, it seeds the list by looking at the 
aggregated logs directory, and then filters out ineligible apps.  One of the 
criteria involves checking with the RM that an Application's log aggregation 
status is not still running and has not failed.  When the RM "forgets" about an 
older completed Application (e.g. RM failover, enough time has passed, etc), 
the tool won't find the Application in the RM and will just assume that its log 
aggregation succeeded, even if it actually failed or is still running.

We can solve this problem by doing the following:
# When the RM sees that an Application has successfully finished aggregation 
its logs, it will write a flag file next to that Application's log files
# The tool no longer talks to the RM at all.  When looking at the FileSystem, 
it now uses that flag file to determine if it should process those log files.  
If the file is there, it archives, otherwise it does not.
# As part of the archiving process, it will delete the flag file
# (If you don't run the tool, the flag file will eventually be cleaned up by 
the JHS when it cleans up the aggregated logs because it's in the same 
directory)

This improvement has several advantages:
# The edge case about "forgotten" Applications is fixed
# The tool no longer has to talk to the RM; it only has to consult HDFS.  This 
is simpler



> RM should write out Aggregated Log Completion file flag next to logs
> 
>
> Key: YARN-4946
> URL: https://issues.apache.org/jira/browse/YARN-4946
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 2.8.0
>Reporter: Robert Kanter
>Assignee: Szilard Nemeth
>Priority: Major
>
> MAPREDUCE-6415 added a tool that combines the aggregated log files for each 
> Yarn App into a HAR file.  When run, it seeds the list by looking at the 
> aggregated logs directory, and then filters out ineligible apps.  One of the 
> criteria involves checking with the RM that an Application's log aggregation 
> status is not still running and has not failed.  When the RM "forgets" about 
> an older completed Application (e.g. RM failover, enough time has passed, 
> etc), the tool won't find the Application in the RM and will just assume that 
> its log aggregation succeeded, even if it actually failed or is still running.
> We can solve this problem by doing the following:
> The RM should not consider an app to be fully completed (and thus removed 
> from its history) until the aggregation status has reached a terminal state 
> (e.g. SUCCEEDED, FAILED, TIME_OUT).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563960#comment-16563960
 ] 

genericqa commented on YARN-8263:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8263 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933784/YARN-8263.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 7b5fc2f89bbc 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 
08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b28bdc7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21449/testReport/ |
| Max. process+thread count | 408 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21449/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DockerClient still touches hadoop.tmp.dir
> 

[jira] [Commented] (YARN-8175) Add support for Node Labels in SLS

2018-07-31 Thread Abhishek Modi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563958#comment-16563958
 ] 

Abhishek Modi commented on YARN-8175:
-

Thanks [~elgoiri] for review and committing it to trunk.

> Add support for Node Labels in SLS
> --
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8175.001.patch, YARN-8175.002.patch, 
> YARN-8175.003.patch, YARN-8175.004.patch, YARN-8175.005.patch, 
> YARN-8175.006.patch, YARN-8175.007.patch, YARN-8175.008.patch, 
> YARN-8175.009.patch
>
>
> Currently, SLS doesn't support node labels. With this jira, we are planning 
> to add support for node labels in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8180) Remove yarn.federation.blacklist-subclusters from YARN federation doc

2018-07-31 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-8180:
--
Description: 
Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
doc by mistake and is not applicable. This JIRA is to remove this property from 
the doc.

 

 

  was:
Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
doc by mistake and is not applicable. This Jira is to remove this property from 
the doc.

 

 


> Remove yarn.federation.blacklist-subclusters from YARN federation doc
> -
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8180.001.patch
>
>
> Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
> doc by mistake and is not applicable. This JIRA is to remove this property 
> from the doc.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8180) Remove yarn.federation.blacklist-subclusters from YARN federation doc

2018-07-31 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-8180:
--
Summary: Remove yarn.federation.blacklist-subclusters from YARN federation 
doc  (was: Remove yarn.federation.blacklist-subclusters from yarn federation 
doc)

> Remove yarn.federation.blacklist-subclusters from YARN federation doc
> -
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
> Attachments: YARN-8180.001.patch
>
>
> Property "yarn.federation.blacklist-subclusters" was added in yarn-federation 
> doc by mistake and is not applicable. This Jira is to remove this property 
> from the doc.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8175) Add support for Node Labels in SLS

2018-07-31 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-8175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-8175:
--
Affects Version/s: 3.1.0

> Add support for Node Labels in SLS
> --
>
> Key: YARN-8175
> URL: https://issues.apache.org/jira/browse/YARN-8175
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Abhishek Modi
>Assignee: Abhishek Modi
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8175.001.patch, YARN-8175.002.patch, 
> YARN-8175.003.patch, YARN-8175.004.patch, YARN-8175.005.patch, 
> YARN-8175.006.patch, YARN-8175.007.patch, YARN-8175.008.patch, 
> YARN-8175.009.patch
>
>
> Currently, SLS doesn't support node labels. With this jira, we are planning 
> to add support for node labels in SLS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8606:
---
Description: 
EventDispatcher for oppurtunistic scheduling is added to RM composite service 
and not RMActiveService composite service  causing dispatcher to be started 
once on RM restart.

Issue credits: Rakesh

  was:
EventDispatcher for oppurtunistic scheduling is not added to RMActiveService 
after failover

Issue credits: Rakesh


> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is added to RM composite service 
> and not RMActiveService composite service  causing dispatcher to be started 
> once on RM restart.
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563919#comment-16563919
 ] 

Bibin A Chundatt commented on YARN-8606:


cc: [~leftnoteasy] . Could you please review

> Opportunistic scheduling doesnt work after failover
> ---
>
> Key: YARN-8606
> URL: https://issues.apache.org/jira/browse/YARN-8606
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8606.001.patch
>
>
> EventDispatcher for oppurtunistic scheduling is not added to RMActiveService 
> after failover
> Issue credits: Rakesh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8606) Opportunistic scheduling doesnt work after failover

2018-07-31 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563892#comment-16563892
 ] 

genericqa commented on YARN-8606:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 17s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 35s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}128m 26s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.policies.TestDominantResourceFairnessPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-8606 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12933769/YARN-8606.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux dc6f86b2fb80 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 7631e0a |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/21448/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21448/testReport/ |
| Max. process+thread count | 928 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 

[jira] [Updated] (YARN-8263) DockerClient still touches hadoop.tmp.dir

2018-07-31 Thread Craig Condit (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Condit updated YARN-8263:
---
Attachment: YARN-8263.001.patch

> DockerClient still touches hadoop.tmp.dir
> -
>
> Key: YARN-8263
> URL: https://issues.apache.org/jira/browse/YARN-8263
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Jason Lowe
>Assignee: Craig Condit
>Priority: Minor
>  Labels: Docker
> Attachments: YARN-8263.001.patch
>
>
> The DockerClient constructor fails if hadoop.tmp.dir is not set and proceeds 
> to create a directory there.  After YARN-8064 there's no longer a need to 
> touch the temporary directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   >