[jira] [Commented] (YARN-9583) Failed job which is submitted unknown queue is showed all users

2019-08-05 Thread KWON BYUNGCHANG (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900622#comment-16900622
 ] 

KWON BYUNGCHANG commented on YARN-9583:
---

[~sunilg]

thank you for feedback.

I checked as you mentioned. RM recovered app as SUCCESS.

This patch does not changed App state.  It just changed checking of permission.

Need test code?

 

> Failed job which is submitted unknown queue is showed all users
> ---
>
> Key: YARN-9583
> URL: https://issues.apache.org/jira/browse/YARN-9583
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9583-screenshot.png, YARN-9583.001.patch, 
> YARN-9583.002.patch, YARN-9583.003.patch, YARN-9583.004.patch, 
> YARN-9583.005.patch
>
>
> In secure mode, Failed job which is submitted unknown queue is showed all 
> users.
> I attached RM UI screen shot.
> reproduction senario
>1. user foo submit job to unknown queue without view-acl and job will fail 
> immediately. 
>2. user bar can access the job of user foo which previously failed.
> According to comments in  QueueACLsManager .java that caused the problem,
> This situation can happen when RM is restarted after deleting queue.
> I think  showing app of non existing queue to all users is the problem after 
> RM start. 
> It will become a security hole.
> I fixed it a little bit.  
> After fixing it, Both owner of job and admin of yarn can access job which is 
> submitted unknown queue. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-05 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900621#comment-16900621
 ] 

Sunil Govindan commented on YARN-9722:
--

Looks fine, lets get this in. +1

> PlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-9722-001.patch
>
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9583) Failed job which is submitted unknown queue is showed all users

2019-08-05 Thread Sunil Govindan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900619#comment-16900619
 ] 

Sunil Govindan commented on YARN-9583:
--

Thanks [~magnum]

[~Prabhu Joseph] and [~magnum], i am worried about one scenario here. Assume an 
application has been submitted on to a queue named "queueA" and successfully 
completed. Now delete queueA and restart RM. Will RM be able to recover this 
app and keep the state as SUCCESS.

Could you please cross check this case as well?

> Failed job which is submitted unknown queue is showed all users
> ---
>
> Key: YARN-9583
> URL: https://issues.apache.org/jira/browse/YARN-9583
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.1.2
>Reporter: KWON BYUNGCHANG
>Assignee: KWON BYUNGCHANG
>Priority: Major
> Attachments: YARN-9583-screenshot.png, YARN-9583.001.patch, 
> YARN-9583.002.patch, YARN-9583.003.patch, YARN-9583.004.patch, 
> YARN-9583.005.patch
>
>
> In secure mode, Failed job which is submitted unknown queue is showed all 
> users.
> I attached RM UI screen shot.
> reproduction senario
>1. user foo submit job to unknown queue without view-acl and job will fail 
> immediately. 
>2. user bar can access the job of user foo which previously failed.
> According to comments in  QueueACLsManager .java that caused the problem,
> This situation can happen when RM is restarted after deleting queue.
> I think  showing app of non existing queue to all users is the problem after 
> RM start. 
> It will become a security hole.
> I fixed it a little bit.  
> After fixing it, Both owner of job and admin of yarn can access job which is 
> submitted unknown queue. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Xie YiFan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6539_3.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch, YARN-6539_3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9719) Failed to restart yarn-service if it doesn’t exist in RM

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900583#comment-16900583
 ] 

Hadoop QA commented on YARN-9719:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  4s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m  9s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 67m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.service.TestYarnNativeServices |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9719 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976768/YARN-9719.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 86fca886 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d6697da |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24478/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-services_hadoop-yarn-services-core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24478/testReport/ |
| Max. process+thread count | 722 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 

[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900552#comment-16900552
 ] 

Hadoop QA commented on YARN-6539:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} YARN-6539 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-6539 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976770/YARN-6359_2.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24479/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Xie YiFan (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xie YiFan updated YARN-6539:

Attachment: YARN-6359_2.patch

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch, YARN-6359_2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9719) Failed to restart yarn-service if it doesn’t exist in RM

2019-08-05 Thread kyungwan nam (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900546#comment-16900546
 ] 

kyungwan nam commented on YARN-9719:


[~Prabhu Joseph], [~eyang] Thank you for your comments.
I've attached a new patch including test code.

> Failed to restart yarn-service if it doesn’t exist in RM
> 
>
> Key: YARN-9719
> URL: https://issues.apache.org/jira/browse/YARN-9719
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9719.001.patch, YARN-9719.002.patch, 
> YARN-9719.003.patch
>
>
> Sometimes, restarting a yarn-service is failed as follows.
> {code}
> {"diagnostics":"Application with id 'application_1562735362534_10461' doesn't 
> exist in RM. Please check that the job submission was successful.\n\tat 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:382)\n\tat
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:234)\n\tat
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:561)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)\n"}
> {code}
> It seems like that it occurs when restarting a yarn-service who was stopped 
> long ago.
> by default, RM keeps up to 1000 completed applications 
> (yarn.resourcemanager.max-completed-applications)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9719) Failed to restart yarn-service if it doesn’t exist in RM

2019-08-05 Thread kyungwan nam (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-9719:
---
Attachment: YARN-9719.003.patch

> Failed to restart yarn-service if it doesn’t exist in RM
> 
>
> Key: YARN-9719
> URL: https://issues.apache.org/jira/browse/YARN-9719
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9719.001.patch, YARN-9719.002.patch, 
> YARN-9719.003.patch
>
>
> Sometimes, restarting a yarn-service is failed as follows.
> {code}
> {"diagnostics":"Application with id 'application_1562735362534_10461' doesn't 
> exist in RM. Please check that the job submission was successful.\n\tat 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:382)\n\tat
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:234)\n\tat
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:561)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)\n"}
> {code}
> It seems like that it occurs when restarting a yarn-service who was stopped 
> long ago.
> by default, RM keeps up to 1000 completed applications 
> (yarn.resourcemanager.max-completed-applications)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread zhenzhao wang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900507#comment-16900507
 ] 

zhenzhao wang commented on YARN-9616:
-

[~smarthan] Sorry, I missed the msg. I got a patch which works well in our 
cluster internally. However, I hadn't got a chance to sort it out and 
contribute to the public repo. I uploaded the  [^YARN-9616.001-2.9.patch]  for 
reference. Feel free to share your patch. Thanks.

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-9616.001-2.9.patch
>
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread zhenzhao wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhenzhao wang updated YARN-9616:

Attachment: YARN-9616.001-2.9.patch

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
> Attachments: YARN-9616.001-2.9.patch
>
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900460#comment-16900460
 ] 

Hadoop QA commented on YARN-9559:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m  
3s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 13s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 9 new + 319 unchanged - 2 fixed = 328 total (was 321) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
53s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
48s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
14s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}120m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9559 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976749/YARN-9559.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  

[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900389#comment-16900389
 ] 

Hadoop QA commented on YARN-9438:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
41s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 79m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}132m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976738/YARN-9438.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6124a3fe0ca4 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / d6697da |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24474/testReport/ |
| Max. process+thread count | 900 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24474/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> launchTime not written to state store for running 

[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900388#comment-16900388
 ] 

Hadoop QA commented on YARN-9438:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  9m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
14s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 58m 
27s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:da67579 |
| JIRA Issue | YARN-9438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976741/YARN-9438-branch-2.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c0428e16cac4 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 19af228 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| Multi-JDK versions |  /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 
/usr/lib/jvm/java-8-openjdk-amd64:1.8.0_222 |
| findbugs | v3.0.0 |
|  Test Results | 

[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900375#comment-16900375
 ] 

Hadoop QA commented on YARN-9438:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-2 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
36s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-2 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed with JDK v1.8.0_222 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 66m 
10s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 22s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:da675796017 |
| JIRA Issue | YARN-9438 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976741/YARN-9438-branch-2.002.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux afaae80393cb 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | branch-2 / 19af228 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_222 |
| Multi-JDK versions |  /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 
/usr/lib/jvm/java-8-openjdk-amd64:1.8.0_222 |
| findbugs | v3.0.0 |
|  Test Results | 

[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900370#comment-16900370
 ] 

Hadoop QA commented on YARN-9722:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 51s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}129m 56s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9722 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976730/YARN-9722-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 397e9c65571a 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / c589983 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/24472/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24472/testReport/ |
| Max. process+thread count | 913 (vs. ulimit of 1) |
| modules | C: 

[jira] [Commented] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900369#comment-16900369
 ] 

Hadoop QA commented on YARN-9722:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 38s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 93m 
28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}159m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9722 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976727/YARN-9722-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 39c874e0710b 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 71aad60 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24471/testReport/ |
| Max. process+thread count | 918 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24471/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Commented] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-05 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900340#comment-16900340
 ] 

Jonathan Hung commented on YARN-9559:
-

[~adam.antal], makes sense, thanks for the comments. Attached 004 to address 
this.

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9559) Create AbstractContainersLauncher for pluggable ContainersLauncher logic

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9559:

Attachment: YARN-9559.004.patch

> Create AbstractContainersLauncher for pluggable ContainersLauncher logic
> 
>
> Key: YARN-9559
> URL: https://issues.apache.org/jira/browse/YARN-9559
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9559.001.patch, YARN-9559.002.patch, 
> YARN-9559.003.patch, YARN-9559.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9438:

Attachment: YARN-9438-branch-2.002.patch

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9438-branch-2.001.patch, 
> YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, 
> YARN-9438.003.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900285#comment-16900285
 ] 

Hudson commented on YARN-9667:
--

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #17042 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17042/])
YARN-9667.  Use setbuf with line buffer to reduce fflush complexity in (eyang: 
rev d6697da5e854355ac3718a85006b73315d0702aa)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/util.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/devices/devices-module.c


> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9438:

Attachment: (was: YARN-9438-branch-2.002.patch)

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9438-branch-2.001.patch, YARN-9438.001.patch, 
> YARN-9438.002.patch, YARN-9438.003.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9438:

Attachment: YARN-9438-branch-2.002.patch

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9438-branch-2.001.patch, 
> YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, 
> YARN-9438.003.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900283#comment-16900283
 ] 

Jonathan Hung commented on YARN-9438:
-

Seems TestApplicationLifetimeMonitor is related. (TestCapacityScheduler passes 
locally)

Uploaded YARN-9438.003 / YARN-9438-branch-2.002 to fix this test. (saves 
in-memory app timeouts in addition to launch time to state store.)

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9438-branch-2.001.patch, 
> YARN-9438-branch-2.002.patch, YARN-9438.001.patch, YARN-9438.002.patch, 
> YARN-9438.003.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900280#comment-16900280
 ] 

Wei-Chiu Chuang commented on YARN-6539:
---

Thanks for the initiative, [~yifan.stan]. added you to the contributor list and 
assigned it to you.

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6539) Create SecureLogin inside Router

2019-08-05 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned YARN-6539:
-

Assignee: Xie YiFan  (was: Shen Yinjie)

> Create SecureLogin inside Router
> 
>
> Key: YARN-6539
> URL: https://issues.apache.org/jira/browse/YARN-6539
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Xie YiFan
>Priority: Minor
> Attachments: YARN-6359_1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9438) launchTime not written to state store for running applications

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-9438:

Attachment: YARN-9438.003.patch

> launchTime not written to state store for running applications
> --
>
> Key: YARN-9438
> URL: https://issues.apache.org/jira/browse/YARN-9438
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-9438-branch-2.001.patch, YARN-9438.001.patch, 
> YARN-9438.002.patch, YARN-9438.003.patch
>
>
> launchTime is only saved to state store after application finishes, so if 
> restart happens, any running applications will have launchTime set as -1 
> (since this is the default timestamp of the recovery event).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900278#comment-16900278
 ] 

Eric Yang commented on YARN-9667:
-

[~pbacsko] Thanks for the explanation. _IOLBF takes care of the flush in 
production code, hence no enhancement is required.  +1 for patch 001.

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-05 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9722:

Attachment: YARN-9722-001.patch

> PlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-9722-001.patch
>
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9722) PlacementRule logs object ID in place of queue name.

2019-08-05 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9722:

Summary: PlacementRule logs object ID in place of queue name.  (was: 
UserGroupMappingPlacementRule logs object ID in place of queue name.)

> PlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9722) UserGroupMappingPlacementRule logs object ID in place of queue name.

2019-08-05 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9722:

Attachment: (was: YARN-9722-001.patch)

> UserGroupMappingPlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9719) Failed to restart yarn-service if it doesn’t exist in RM

2019-08-05 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900251#comment-16900251
 ] 

Eric Yang commented on YARN-9719:
-

[~kyungwan nam] Thank you for the patch, and this looks like a important 
change.  Can we add a test case to safe guard this from future regression?
[~Prabhu Joseph] Thank you for the review.

> Failed to restart yarn-service if it doesn’t exist in RM
> 
>
> Key: YARN-9719
> URL: https://issues.apache.org/jira/browse/YARN-9719
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9719.001.patch, YARN-9719.002.patch
>
>
> Sometimes, restarting a yarn-service is failed as follows.
> {code}
> {"diagnostics":"Application with id 'application_1562735362534_10461' doesn't 
> exist in RM. Please check that the job submission was successful.\n\tat 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:382)\n\tat
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:234)\n\tat
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:561)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)\n"}
> {code}
> It seems like that it occurs when restarting a yarn-service who was stopped 
> long ago.
> by default, RM keeps up to 1000 completed applications 
> (yarn.resourcemanager.max-completed-applications)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8944) TestContainerAllocation.testUserLimitAllocationMultipleContainers failure after YARN-8896

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8944:

Fix Version/s: 3.0.4
   2.10.0

> TestContainerAllocation.testUserLimitAllocationMultipleContainers failure 
> after YARN-8896
> -
>
> Key: YARN-8944
> URL: https://issues.apache.org/jira/browse/YARN-8944
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2, 3.2.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8944.001.patch
>
>
> YARN-8896 changes the behaviour of the CapacityScheduler by limiting the 
> number of containers that can be allocated in one heartbeat. It is an 
> undocumented change in behaviour.
> The change breaks the junit test: 
> {{TestContainerAllocation.testUserLimitAllocationMultipleContainers}}
> The maximum number of containers that gets assigned via the on heartbeat is 
> 100 and it expects 199 to be assigned.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6851) Capacity Scheduler: document configs for controlling # containers allowed to be allocated per node heartbeat

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-6851:

Fix Version/s: 3.0.4

> Capacity Scheduler: document configs for controlling # containers allowed to 
> be allocated per node heartbeat
> 
>
> Key: YARN-6851
> URL: https://issues.apache.org/jira/browse/YARN-6851
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.4
>
> Attachments: YARN-6851-branch-2.001.patch, YARN-6851.001.patch
>
>
> YARN-4161 introduces new configs for controlling how many containers allowed 
> to be allocated in each node heartbeat. And we also have offswitchCount 
> config before. Would be better to document these configurations in CS section.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8944) TestContainerAllocation.testUserLimitAllocationMultipleContainers failure after YARN-8896

2019-08-05 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900244#comment-16900244
 ] 

Jonathan Hung commented on YARN-8944:
-

Committed to branch-3.0 and branch-2.

> TestContainerAllocation.testUserLimitAllocationMultipleContainers failure 
> after YARN-8896
> -
>
> Key: YARN-8944
> URL: https://issues.apache.org/jira/browse/YARN-8944
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2, 3.2.1
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Minor
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8944.001.patch
>
>
> YARN-8896 changes the behaviour of the CapacityScheduler by limiting the 
> number of containers that can be allocated in one heartbeat. It is an 
> undocumented change in behaviour.
> The change breaks the junit test: 
> {{TestContainerAllocation.testUserLimitAllocationMultipleContainers}}
> The maximum number of containers that gets assigned via the on heartbeat is 
> 100 and it expects 199 to be assigned.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6851) Capacity Scheduler: document configs for controlling # containers allowed to be allocated per node heartbeat

2019-08-05 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900243#comment-16900243
 ] 

Jonathan Hung commented on YARN-6851:
-

Committed to branch-3.0.

> Capacity Scheduler: document configs for controlling # containers allowed to 
> be allocated per node heartbeat
> 
>
> Key: YARN-6851
> URL: https://issues.apache.org/jira/browse/YARN-6851
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Wei Yan
>Assignee: Wei Yan
>Priority: Minor
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.4
>
> Attachments: YARN-6851-branch-2.001.patch, YARN-6851.001.patch
>
>
> YARN-4161 introduces new configs for controlling how many containers allowed 
> to be allocated in each node heartbeat. And we also have offswitchCount 
> config before. Would be better to document these configurations in CS section.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8915) Update the doc about the default value of "maximum-container-assignments" for capacity scheduler

2019-08-05 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900242#comment-16900242
 ] 

Jonathan Hung commented on YARN-8915:
-

Committed to branch-3.0 and branch-2.

> Update the doc about the default value of "maximum-container-assignments" for 
> capacity scheduler
> 
>
> Key: YARN-8915
> URL: https://issues.apache.org/jira/browse/YARN-8915
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Minor
>  Labels: docuentation
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8915-trunk.001.patch, YARN-8915-trunk.002.patch
>
>
> Update default value of "maximum-container-assignments" from "-1" to 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8896) Limit the maximum number of container assignments per heartbeat

2019-08-05 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900241#comment-16900241
 ] 

Jonathan Hung commented on YARN-8896:
-

Committed to branch-3.0 and branch-2.

> Limit the maximum number of container assignments per heartbeat
> ---
>
> Key: YARN-8896
> URL: https://issues.apache.org/jira/browse/YARN-8896
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Weiwei Yang
>Assignee: Zhankun Tang
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.2.1
>
> Attachments: YARN-8896-trunk.001.patch
>
>
> YARN-4161 adds a configuration 
> \{{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> to control max number of container assignments per heartbeat, however the 
> default value is -1. This could potentially cause the CS gets stuck in the 
> while loop causing issue like YARN-8513. We should change this to a finite 
> number, e.g 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8915) Update the doc about the default value of "maximum-container-assignments" for capacity scheduler

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8915:

Fix Version/s: 3.0.4
   2.10.0

> Update the doc about the default value of "maximum-container-assignments" for 
> capacity scheduler
> 
>
> Key: YARN-8915
> URL: https://issues.apache.org/jira/browse/YARN-8915
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Minor
>  Labels: docuentation
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.3.0, 3.2.1
>
> Attachments: YARN-8915-trunk.001.patch, YARN-8915-trunk.002.patch
>
>
> Update default value of "maximum-container-assignments" from "-1" to 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8896) Limit the maximum number of container assignments per heartbeat

2019-08-05 Thread Jonathan Hung (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung updated YARN-8896:

Fix Version/s: 3.0.4
   2.10.0

> Limit the maximum number of container assignments per heartbeat
> ---
>
> Key: YARN-8896
> URL: https://issues.apache.org/jira/browse/YARN-8896
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Weiwei Yang
>Assignee: Zhankun Tang
>Priority: Major
> Fix For: 2.10.0, 3.0.4, 3.1.2, 3.2.1
>
> Attachments: YARN-8896-trunk.001.patch
>
>
> YARN-4161 adds a configuration 
> \{{yarn.scheduler.capacity.per-node-heartbeat.maximum-container-assignments}} 
> to control max number of container assignments per heartbeat, however the 
> default value is -1. This could potentially cause the CS gets stuck in the 
> while loop causing issue like YARN-8513. We should change this to a finite 
> number, e.g 100.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900236#comment-16900236
 ] 

Peter Bacsko commented on YARN-9667:


Ah sorry it wasn't clear. I kept those only because they occur before a call to 
{{fork()}}. So basically whatever is inside the buffers, it will be forced out. 
Ideally nothing should be there, but it's a good safety net.

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900235#comment-16900235
 ] 

Eric Yang commented on YARN-9667:
-

[~pbacsko] {quote}I kept the existing fprintf/fflush pair in the tests because 
it happens before fork() and it feels like a good practice. In fact, we can add 
this to production code as well.{quote}

I don't understand the statement here.  It sounds like you prefer to keep 
fprintf/fflush in production code?  The patch is moving in opposite direction 
to remove fflush.  Can you elaborate which direction is preferred?
Thank you for the patch.  I tested this patch on my cluster this morning and it 
worked fine.

 

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9722) UserGroupMappingPlacementRule logs object ID in place of queue name.

2019-08-05 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9722:

Labels: supportability  (was: )

> UserGroupMappingPlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>  Labels: supportability
> Attachments: YARN-9722-001.patch
>
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9722) UserGroupMappingPlacementRule logs object ID in place of queue name.

2019-08-05 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9722:

Attachment: YARN-9722-001.patch

> UserGroupMappingPlacementRule logs object ID in place of queue name.
> 
>
> Key: YARN-9722
> URL: https://issues.apache.org/jira/browse/YARN-9722
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9722-001.patch
>
>
> UserGroupMappingPlacementRule logs object ID in place of queue name.
> {code}
> 2019-08-05 09:28:52,664 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
>  Application application_1564996871731_0003 user ambari-qa mapping [default] 
> to 
> [org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
>  override false
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9722) UserGroupMappingPlacementRule logs object ID in place of queue name.

2019-08-05 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-9722:
---

 Summary: UserGroupMappingPlacementRule logs object ID in place of 
queue name.
 Key: YARN-9722
 URL: https://issues.apache.org/jira/browse/YARN-9722
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


UserGroupMappingPlacementRule logs object ID in place of queue name.

{code}
2019-08-05 09:28:52,664 INFO 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule:
 Application application_1564996871731_0003 user ambari-qa mapping [default] to 
[org.apache.hadoop.yarn.server.resourcemanager.placement.ApplicationPlacementContext@5aafe9b2]
 override false
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9719) Failed to restart yarn-service if it doesn’t exist in RM

2019-08-05 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900229#comment-16900229
 ] 

Prabhu Joseph commented on YARN-9719:
-

[~kyungwan nam]  [^YARN-9719.002.patch]  looks good. +1 (non-binding).

[~eyang] Can you review this Jira when you get time. This fixes yarn native 
service failing to start when the previous run ApplicationReport is not in RM 
Memory or AHS.

> Failed to restart yarn-service if it doesn’t exist in RM
> 
>
> Key: YARN-9719
> URL: https://issues.apache.org/jira/browse/YARN-9719
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-9719.001.patch, YARN-9719.002.patch
>
>
> Sometimes, restarting a yarn-service is failed as follows.
> {code}
> {"diagnostics":"Application with id 'application_1562735362534_10461' doesn't 
> exist in RM. Please check that the job submission was successful.\n\tat 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:382)\n\tat
>  
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:234)\n\tat
>  
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:561)\n\tat
>  
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)\n\tat
>  org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)\n\tat 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)\n\tat 
> java.security.AccessController.doPrivileged(Native Method)\n\tat 
> javax.security.auth.Subject.doAs(Subject.java:422)\n\tat 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)\n\tat
>  org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)\n"}
> {code}
> It seems like that it occurs when restarting a yarn-service who was stopped 
> long ago.
> by default, RM keeps up to 1000 completed applications 
> (yarn.resourcemanager.max-completed-applications)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900227#comment-16900227
 ] 

Peter Bacsko commented on YARN-9667:


[~eyang] [~snemeth] pls check this out when you have some time.

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900226#comment-16900226
 ] 

Peter Bacsko commented on YARN-9667:


Changes in the patch:
 # Removed unnecessary \{{fflush()}} calls (mostly all of them)
 # Added missing "\n" to the {{fprintf()}} calls that forces {{fflush()}} due 
to {{_IOLBF}}
 # Added {{setvbuf()}} to {{main.c}} and to the unit tests

I didn't modify the child process handling logic to call {{_exit()}}, let's 
discuss that in a separate JIRA.

I kept the existing {{fprintf}}/{{fflush}} pair in the tests because it happens 
before {{fork()}} and it feels like a good practice. In fact, we can add this 
to production code as well.

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900225#comment-16900225
 ] 

Hadoop QA commented on YARN-9667:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
31m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 52s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
12s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 68m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9667 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976707/YARN-9667-001.patch |
| Optional Tests |  dupname  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 1e7d18c37fb5 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 71aad60 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24470/testReport/ |
| Max. process+thread count | 446 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24470/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> 

[jira] [Commented] (YARN-9564) Create docker-to-squash tool for image conversion

2019-08-05 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900209#comment-16900209
 ] 

Eric Badger commented on YARN-9564:
---

Hi [~eyang], appreciate the comments. First and foremost, were you eventually 
able to get the script to work and correctly setup the squashFS layers, config, 
manifest, and image-tag-to-hash file? I know that the script is far from 
perfect and certainly has bugs or improvements to make. But my main reason for 
posting it was to help setup the runC environment so that you can get YARN-9562 
and YARN-9561 running. 

bq. 5. This script requires password less sudo. This is usually a no-no from 
Enterprise user. What can we do about these sudo commands? Maybe assume this 
script require admin privileges?

Yes, you'll need to run this script as sudo due to a few of the commands in the 
script. It's probably easiest to run the whole script as root, but I like to 
run as little as possible as root. I could require the script run as root and 
then drop privileges when they aren't needed.

bq. 7. How about having this writing in Java?
I believe that [~ccondit] has a java version of something similar to this tool. 
I don't think that I'm going to have time to rewrite this in Java, but we might 
be able to leverage his tool if you think that approach is better.

I will fix the rest of the issues that you raised. 

> Create docker-to-squash tool for image conversion
> -
>
> Key: YARN-9564
> URL: https://issues.apache.org/jira/browse/YARN-9564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Attachments: YARN-9564.001.patch
>
>
> The new runc runtime uses docker images that are converted into multiple 
> squashfs images. Each layer of the docker image will get its own squashfs 
> image. We need a tool to help automate the creation of these squashfs images 
> when all we have is a docker image



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9667) Container-executor.c duplicates messages to stdout

2019-08-05 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9667:
---
Attachment: YARN-9667-001.patch

> Container-executor.c duplicates messages to stdout
> --
>
> Key: YARN-9667
> URL: https://issues.apache.org/jira/browse/YARN-9667
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9667-001.patch
>
>
> When a container is killed by its AM we get a similar error message like this:
> {noformat}
> 2019-06-30 12:09:04,412 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor:
>  Shell execution returned exit code: 143. Privileged Execution Operation 
> Stderr:
> Stdout: main : command provided 1
> main : run as user is systest
> main : requested yarn user is systest
> Getting exit code file...
> Creating script paths...
> Writing pid file...
> Writing to tmp file 
> /yarn/nm/nmPrivate/application_1561921629886_0001/container_e84_1561921629886_0001_01_19/container_e84_1561921629886_0001_01_19.pid.tmp
> Writing to cgroup task files...
> Creating local dirs...
> Launching container...
> Getting exit code file...
> Creating script paths...
> {noformat}
> In container-executor.c the fork point is right after the "Creating script 
> paths..." part, though in the Stdout log we can clearly see it has been 
> written there twice. After consulting with [~pbacsko] it seems like there's a 
> missing flush in container-executor.c before the fork and that causes the 
> duplication.
> I suggest to add a flush there so that it won't be duplicated: it's a bit 
> misleading that the child process writes out "Getting exit code file" and 
> "Creating script paths" even though it is clearly not doing that.
> A more appealing solution could be to revisit the fprintf-fflush pairs in the 
> code and change them to a single call, so that the fflush calls would not be 
> forgotten accidentally. (It can cause problems in every place where it's 
> used).
> Note: this issue probably affects every occasion of fork(), not just the one 
> from {{launch_container_as_user}} in {{main.c}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-05 Thread Zac Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zac Zhou updated YARN-9721:
---
Description: 
If we want to take offline a nodemanager server, nodes.exclude-path
 and "rmadmin -refreshNodes" command are used to decommission the server.
 But this method cannot clean up the node clearly. Nodemanager servers are 
still in Decommissioned Nodes as the attachment shows.

  !decommission nodes.png!

YARN-4311 enable a removalTimer to clean up the untracked node.
 But the logic of isUntrackedNode method is to restrict. If include-path is not 
used, no servers can meet the criteria. Using an include file would make a 
potential risk in maintenance.

If yarn cluster is installed on cloud, nodemanager servers are created and 
deleted frequently. We need a way to exclude a nodemanager from the yarn 
cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
keep growing, which would cause a memory issue of RM.

  was:
If we want to take offline a nodemanager server, nodes.exclude-path
 and "rmadmin -refreshNodes" command are used to decommission the server.
 But this method cannot clean up the node clearly. Nodemanager servers are 
still in Decommissioned Nodes as the attachment shows.

 

YARN-4311 enable a removalTimer to clean up the untracked node.
 But the logic of isUntrackedNode method is to restrict. If include-path is not 
used, no servers can meet the criteria. Using an include file would make a 
potential risk in maintenance.

If yarn cluster is installed on cloud, nodemanager servers are created and 
deleted frequently. We need a way to exclude a nodemanager from the yarn 
cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
keep growing, which would cause a memory issue of RM.


> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>   !decommission nodes.png!
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-05 Thread Zac Zhou (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zac Zhou updated YARN-9721:
---
Description: 
If we want to take offline a nodemanager server, nodes.exclude-path
 and "rmadmin -refreshNodes" command are used to decommission the server.
 But this method cannot clean up the node clearly. Nodemanager servers are 
still in Decommissioned Nodes as the attachment shows.

 

YARN-4311 enable a removalTimer to clean up the untracked node.
 But the logic of isUntrackedNode method is to restrict. If include-path is not 
used, no servers can meet the criteria. Using an include file would make a 
potential risk in maintenance.

If yarn cluster is installed on cloud, nodemanager servers are created and 
deleted frequently. We need a way to exclude a nodemanager from the yarn 
cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
keep growing, which would cause a memory issue of RM.

  was:
If we want to take offline a nodemanager server, nodes.exclude-path
and "rmadmin -refreshNodes" command are used to decommission the server.
But this method cannot clean up the node clearly. Nodemanager servers are still 
in Decommissioned Nodes as the attachment shows.

[YARN-4311|https://issues.apache.org/jira/browse/YARN-4311] enable a 
removalTimer to clean up the untracked node.
But the logic of isUntrackedNode method is to restrict. If include-path is not 
used, no servers can meet the criteria. Using an include file would make a 
potential risk in maintenance.

If yarn cluster is installed on cloud, nodemanager servers are created and 
deleted frequently. We need a way to exclude a nodemanager from the yarn 
cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
keep growing, which would cause a memory issue of RM.


> An easy method to exclude a nodemanager from the yarn cluster cleanly
> -
>
> Key: YARN-9721
> URL: https://issues.apache.org/jira/browse/YARN-9721
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Priority: Major
> Attachments: decommission nodes.png
>
>
> If we want to take offline a nodemanager server, nodes.exclude-path
>  and "rmadmin -refreshNodes" command are used to decommission the server.
>  But this method cannot clean up the node clearly. Nodemanager servers are 
> still in Decommissioned Nodes as the attachment shows.
>  
> YARN-4311 enable a removalTimer to clean up the untracked node.
>  But the logic of isUntrackedNode method is to restrict. If include-path is 
> not used, no servers can meet the criteria. Using an include file would make 
> a potential risk in maintenance.
> If yarn cluster is installed on cloud, nodemanager servers are created and 
> deleted frequently. We need a way to exclude a nodemanager from the yarn 
> cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
> keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9094) Remove unused interface method: NodeResourceUpdaterPlugin#handleUpdatedResourceFromRM

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899986#comment-16899986
 ] 

Hadoop QA commented on YARN-9094:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9094 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976677/YARN-9094.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e20525f25f53 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f8ea6e1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24469/testReport/ |
| Max. process+thread count | 448 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24469/console |
| Powered by | Apache Yetus 0.8.0   

[jira] [Created] (YARN-9721) An easy method to exclude a nodemanager from the yarn cluster cleanly

2019-08-05 Thread Zac Zhou (JIRA)
Zac Zhou created YARN-9721:
--

 Summary: An easy method to exclude a nodemanager from the yarn 
cluster cleanly
 Key: YARN-9721
 URL: https://issues.apache.org/jira/browse/YARN-9721
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zac Zhou
 Attachments: decommission nodes.png

If we want to take offline a nodemanager server, nodes.exclude-path
and "rmadmin -refreshNodes" command are used to decommission the server.
But this method cannot clean up the node clearly. Nodemanager servers are still 
in Decommissioned Nodes as the attachment shows.

[YARN-4311|https://issues.apache.org/jira/browse/YARN-4311] enable a 
removalTimer to clean up the untracked node.
But the logic of isUntrackedNode method is to restrict. If include-path is not 
used, no servers can meet the criteria. Using an include file would make a 
potential risk in maintenance.

If yarn cluster is installed on cloud, nodemanager servers are created and 
deleted frequently. We need a way to exclude a nodemanager from the yarn 
cluster cleanly. Otherwise, the map of rmContext.getInactiveRMNodes() would 
keep growing, which would cause a memory issue of RM.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9096) Some GpuResourcePlugin and ResourcePluginManager methods are synchronized unnecessarily

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899953#comment-16899953
 ] 

Hadoop QA commented on YARN-9096:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 29s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 0 new + 1 unchanged - 1 fixed = 1 total (was 2) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
58s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=19.03.1 Server=19.03.1 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9096 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12976670/YARN-9096.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4dacfc63b836 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f8ea6e1 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24468/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| 

[jira] [Updated] (YARN-9094) Remove unused interface method: NodeResourceUpdaterPlugin#handleUpdatedResourceFromRM

2019-08-05 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9094:
-
Attachment: YARN-9094.001.patch

> Remove unused interface method: 
> NodeResourceUpdaterPlugin#handleUpdatedResourceFromRM
> -
>
> Key: YARN-9094
> URL: https://issues.apache.org/jira/browse/YARN-9094
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Gergely Pollak
>Priority: Trivial
>  Labels: newbie, newbie++
> Attachments: YARN-9094.001.patch, YARN-9094.001.patch
>
>
> Additionally, there's a typo can be fixed in the javadoc of 
> NodeResourceUpdaterPlugin#updateConfiguredResource: look for "mododule"



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9134) No test coverage for redefining FPGA / GPU resource types in TestResourceUtils

2019-08-05 Thread Szilard Nemeth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883603#comment-16883603
 ] 

Szilard Nemeth edited comment on YARN-9134 at 8/5/19 8:48 AM:
--

Hi [~pbacsko]!
Could you please check the javac issue?
Thanks!


was (Author: snemeth):
Hi [~pbacsko]!
Could yoy please check the javac issue?
Thanks!

> No test coverage for redefining FPGA / GPU resource types in TestResourceUtils
> --
>
> Key: YARN-9134
> URL: https://issues.apache.org/jira/browse/YARN-9134
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9134.001.patch, YARN-9134.002.patch
>
>
> The patch also includes some trivial code cleanup.
> Also, setupResourceTypes has been deprecated as it is dangerous to use, see 
> the javadoc for details.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9134) No test coverage for redefining FPGA / GPU resource types in TestResourceUtils

2019-08-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899891#comment-16899891
 ] 

Hadoop QA commented on YARN-9134:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-9134 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12964991/YARN-9134.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24467/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> No test coverage for redefining FPGA / GPU resource types in TestResourceUtils
> --
>
> Key: YARN-9134
> URL: https://issues.apache.org/jira/browse/YARN-9134
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9134.001.patch, YARN-9134.002.patch
>
>
> The patch also includes some trivial code cleanup.
> Also, setupResourceTypes has been deprecated as it is dangerous to use, see 
> the javadoc for details.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8366) Expose debug log information when user intend to enable GPU without setting nvidia-smi path

2019-08-05 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899876#comment-16899876
 ] 

Adam Antal commented on YARN-8366:
--

Is there any update on this [~Zian Chen]?

> Expose debug log information when user intend to enable GPU without setting 
> nvidia-smi path
> ---
>
> Key: YARN-8366
> URL: https://issues.apache.org/jira/browse/YARN-8366
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.0.0
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
>
> Expose Debug information help user found the root cause of failure when user 
> don't make these two settings manually before enabling GPU on YARN
> 1. yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables in 
> yarn-site.xml
> 2. environment variable LD_LIBRARY_PATH



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9134) No test coverage for redefining FPGA / GPU resource types in TestResourceUtils

2019-08-05 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899874#comment-16899874
 ] 

Adam Antal commented on YARN-9134:
--

[~pbacsko], Is there any update on this?

> No test coverage for redefining FPGA / GPU resource types in TestResourceUtils
> --
>
> Key: YARN-9134
> URL: https://issues.apache.org/jira/browse/YARN-9134
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: YARN-9134.001.patch, YARN-9134.002.patch
>
>
> The patch also includes some trivial code cleanup.
> Also, setupResourceTypes has been deprecated as it is dangerous to use, see 
> the javadoc for details.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9096) Some GpuResourcePlugin and ResourcePluginManager methods are synchronized unnecessarily

2019-08-05 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9096:
-
Attachment: YARN-9096.002.patch

> Some GpuResourcePlugin and ResourcePluginManager methods are synchronized 
> unnecessarily
> ---
>
> Key: YARN-9096
> URL: https://issues.apache.org/jira/browse/YARN-9096
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-9096.001.patch, YARN-9096.002.patch, 
> YARN-9096.002.patch
>
>
> These methods are not used concurrently, they are part of the initialization 
> code of NM that happens from one thread.
> This is the list of the call hierarchies: 
> 1. GpuResourcePlugin.initialize + ResourcePluginManager.initialize
>  
> {code:java}
> GpuResourcePlugin.initialize(Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
>  ResourcePluginManager.initialize(Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin) 
> NodeManager.serviceInit(Configuration) 
> (org.apache.hadoop.yarn.server.nodemanager){code}
>  
>  
> 2. GpuResourcePlugin.createResourceHandler: 
>  
> {code:java}
> GpuResourcePlugin.createResourceHandler(Context, CGroupsHandler, 
> PrivilegedOperationExecutor) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
>  ResourceHandlerModule.addHandlersFromConfiguredResourcePlugins(List, 
> Configuration, Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
> ResourceHandlerModule.initializeConfiguredResourceHandlerChain(Configuration, 
> Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
> ResourceHandlerModule.getConfiguredResourceHandlerChain(Configuration, 
> Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
> ContainerScheduler.serviceInit(Configuration) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler) 
> LinuxContainerExecutor.init(Context) 
> (org.apache.hadoop.yarn.server.nodemanager)
> {code}
>  
> 3. GpuResourcePlugin.getNodeResourceHandlerInstance: 
>  
> {code:java}
> GpuResourcePlugin.getNodeResourceHandlerInstance() 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
> NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(Resource)(2 usages) 
> (org.apache.hadoop.yarn.server.nodemanager)
> NodeStatusUpdaterImpl.serviceInit(Configuration) 
> (org.apache.hadoop.yarn.server.nodemanager)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9096) Some GpuResourcePlugin and ResourcePluginManager methods are synchronized unnecessarily

2019-08-05 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899867#comment-16899867
 ] 

Adam Antal commented on YARN-9096:
--

+1 (non-binding)

> Some GpuResourcePlugin and ResourcePluginManager methods are synchronized 
> unnecessarily
> ---
>
> Key: YARN-9096
> URL: https://issues.apache.org/jira/browse/YARN-9096
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Gergely Pollak
>Priority: Major
> Attachments: YARN-9096.001.patch, YARN-9096.002.patch
>
>
> These methods are not used concurrently, they are part of the initialization 
> code of NM that happens from one thread.
> This is the list of the call hierarchies: 
> 1. GpuResourcePlugin.initialize + ResourcePluginManager.initialize
>  
> {code:java}
> GpuResourcePlugin.initialize(Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
>  ResourcePluginManager.initialize(Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin) 
> NodeManager.serviceInit(Configuration) 
> (org.apache.hadoop.yarn.server.nodemanager){code}
>  
>  
> 2. GpuResourcePlugin.createResourceHandler: 
>  
> {code:java}
> GpuResourcePlugin.createResourceHandler(Context, CGroupsHandler, 
> PrivilegedOperationExecutor) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
>  ResourceHandlerModule.addHandlersFromConfiguredResourcePlugins(List, 
> Configuration, Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
> ResourceHandlerModule.initializeConfiguredResourceHandlerChain(Configuration, 
> Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
> ResourceHandlerModule.getConfiguredResourceHandlerChain(Configuration, 
> Context) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources) 
> ContainerScheduler.serviceInit(Configuration) 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler) 
> LinuxContainerExecutor.init(Context) 
> (org.apache.hadoop.yarn.server.nodemanager)
> {code}
>  
> 3. GpuResourcePlugin.getNodeResourceHandlerInstance: 
>  
> {code:java}
> GpuResourcePlugin.getNodeResourceHandlerInstance() 
> (org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu)
> NodeStatusUpdaterImpl.updateConfiguredResourcesViaPlugins(Resource)(2 usages) 
> (org.apache.hadoop.yarn.server.nodemanager)
> NodeStatusUpdaterImpl.serviceInit(Configuration) 
> (org.apache.hadoop.yarn.server.nodemanager)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread wangchengwei (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899837#comment-16899837
 ] 

wangchengwei edited comment on YARN-9616 at 8/5/19 7:38 AM:


Hi, [~wzzdreamer] ,  I have figured out a solution to this issue.

What caused this issue is that the packed arhcive files would be unpacked by NM 
automatically after localization, then the _SharedCacheUploader_ coludn't find 
the origin archive files and threw _FileNotFoundException_. This issue wolud 
lead to the packed archive files wolud never upload to share cache, and would 
be uploaded and localized again and again. 

All origin resource files wolud be upload to a hdfs path (staging or  specified 
by user) before job submitted, so all resource files cloud be found at hdfs. As 
the origin resource files of packed archives cloud not found in NM, we cloud 
get these files from their hdfs path rather than NM local path. So the solution 
to this issue is:
 # *check whether the resource is packed archive before upload*
 # *if not,  uploaded it from NM local path*
 # *if yes, copied origin file in hdfs to the shared cache path*

The solution colud solve this issue in my tests.  If needed, I cloud submit the 
patch here.

 


was (Author: smarthan):
Hi, [~wzzdreamer] ,  I have figured out a solution to this issue.

What caused this issue is that the packed arhcive files would be unpacked by NM 
automatically after localization, then the _SharedCacheUploader_ coludn't find 
the origin archive files and threw _FileNotFoundException_. This issue wolud 
lead to the packed archive files wolud never upload to share cache, and would 
be uploaded and localized again and again. 

All origin resource files wolud be upload to a hdfs path (staging or  specified 
by user) before job submitted, so all resource files cloud be found at hdfs. As 
the origin resource files of packed archives cloud not found in NM, we cloud 
get these files from their hdfs path rather than NM local path. So the solution 
to this issue is:
 # *check whether the resource is packed archive before upload*
 #  *if not,  uploaded it from NM local path*
 # *if yes, copied origin file in hdfs to the shared cache path*

The solution colud solve this issue in my tests.  I submit the patch here, 
please review it if possible. 

 

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> 

[jira] [Commented] (YARN-9616) Shared Cache Manager Failed To Upload Unpacked Resources

2019-08-05 Thread wangchengwei (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899837#comment-16899837
 ] 

wangchengwei commented on YARN-9616:


Hi, [~wzzdreamer] ,  I have figured out a solution to this issue.

What caused this issue is that the packed arhcive files would be unpacked by NM 
automatically after localization, then the _SharedCacheUploader_ coludn't find 
the origin archive files and threw _FileNotFoundException_. This issue wolud 
lead to the packed archive files wolud never upload to share cache, and would 
be uploaded and localized again and again. 

All origin resource files wolud be upload to a hdfs path (staging or  specified 
by user) before job submitted, so all resource files cloud be found at hdfs. As 
the origin resource files of packed archives cloud not found in NM, we cloud 
get these files from their hdfs path rather than NM local path. So the solution 
to this issue is:
 # *check whether the resource is packed archive before upload*
 #  *if not,  uploaded it from NM local path*
 # *if yes, copied origin file in hdfs to the shared cache path*

The solution colud solve this issue in my tests.  I submit the patch here, 
please review it if possible. 

 

> Shared Cache Manager Failed To Upload Unpacked Resources
> 
>
> Key: YARN-9616
> URL: https://issues.apache.org/jira/browse/YARN-9616
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 2.9.2, 2.8.5
>Reporter: zhenzhao wang
>Assignee: zhenzhao wang
>Priority: Major
>
> Yarn will unpack archives files and some other files based on the file type 
> and configuration. E.g. 
>  If I started an MR job with -archive one.zip, then the one.zip will be 
> unpacked while download. Let's say there're file1 && file2 inside one.zip. 
> Then the files kept on local disk will be like 
> /disk3/yarn/local/filecache/352/one.zip/file1 
> and/disk3/yarn/local/filecache/352/one.zip/file2. So the shared cache 
> uploader couldn't upload one.zip to shared cache as it was removed during 
> localization. The following errors will be thrown.
> {code:java}
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader:
>  Exception while uploading the file dict.zip
> java.io.FileNotFoundException: File 
> /disk3/yarn/local/filecache/352/one.zip/one.zip does not exist
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:631)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:857)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:621)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:146)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:926)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.computeChecksum(SharedCacheUploader.java:257)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploader.call(SharedCacheUploader.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org