date:20200708

[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154242#comment-17154242
 ] 

Bilwa S T commented on YARN-10341:
--

Fixed Checkstyle issues

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154235#comment-17154235
 ] 

Hadoop QA commented on YARN-10341:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
55s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
27s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26262/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10341 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007346/YARN-10341.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 760c568266eb 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 10d218934c9 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/26262/testReport/ |
| Max. process+thread count | 777 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core

[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-08 Thread Yao Guangdong (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154217#comment-17154217
 ] 

Yao Guangdong commented on YARN-10324:
--

 Fixed cached memory calculate incorrect bug. Add new patch YARN-10324.002.patch

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-08 Thread Yao Guangdong (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Guangdong updated YARN-10324:
-
Attachment: YARN-10324.002.patch

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Bilwa S T (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10341:
-
Attachment: YARN-10341.004.patch

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10344) Sync netty versions in hadoop-yarn-csi

2020-07-08 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154190#comment-17154190
 ] 

Hudson commented on YARN-10344:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18420 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18420/])
YARN-10344. Sync netty versions in hadoop-yarn-csi. (#2126) (github: rev 
10d218934c9bc143bf8578c92cdbd6df6a4d3b98)
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-csi/pom.xml


> Sync netty versions in hadoop-yarn-csi
> --
>
> Key: YARN-10344
> URL: https://issues.apache.org/jira/browse/YARN-10344
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.3.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Fix For: 3.4.0, 3.3.1
>
>
> netty-all is now 4.1.50.Final but the other netty libraries are 4.1.42.Final:
> {noformat}
> [INFO] --- maven-dependency-plugin:3.0.2:tree (default-cli) @ hadoop-yarn-csi 
> ---
> [INFO] org.apache.hadoop:hadoop-yarn-csi:jar:3.3.0
> [INFO] +- com.google.guava:guava:jar:20.0:compile
> [INFO] +- com.google.protobuf:protobuf-java:jar:3.6.1:compile
> [INFO] +- io.netty:netty-all:jar:4.1.50.Final:compile
> [INFO] +- io.grpc:grpc-core:jar:1.26.0:compile
> [INFO] |  +- io.grpc:grpc-api:jar:1.26.0:compile (version selected from 
> constraint [1.26.0,1.26.0])
> [INFO] |  |  +- io.grpc:grpc-context:jar:1.26.0:compile
> [INFO] |  |  +- 
> com.google.errorprone:error_prone_annotations:jar:2.3.3:compile
> [INFO] |  |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
> [INFO] |  +- com.google.code.gson:gson:jar:2.2.4:compile
> [INFO] |  +- com.google.android:annotations:jar:4.1.1.4:compile
> [INFO] |  +- io.perfmark:perfmark-api:jar:0.19.0:compile
> [INFO] |  +- io.opencensus:opencensus-api:jar:0.24.0:compile
> [INFO] |  \- io.opencensus:opencensus-contrib-grpc-metrics:jar:0.24.0:compile
> [INFO] +- io.grpc:grpc-protobuf:jar:1.26.0:compile
> [INFO] |  +- com.google.api.grpc:proto-google-common-protos:jar:1.12.0:compile
> [INFO] |  \- io.grpc:grpc-protobuf-lite:jar:1.26.0:compile
> [INFO] +- io.grpc:grpc-stub:jar:1.26.0:compile
> [INFO] +- io.grpc:grpc-netty:jar:1.26.0:compile
> [INFO] |  +- io.netty:netty-codec-http2:jar:4.1.42.Final:compile (version 
> selected from constraint [4.1.42.Final,4.1.42.Final])
> [INFO] |  |  +- io.netty:netty-common:jar:4.1.42.Final:compile
> [INFO] |  |  +- io.netty:netty-buffer:jar:4.1.42.Final:compile
> [INFO] |  |  +- io.netty:netty-transport:jar:4.1.42.Final:compile
> [INFO] |  |  |  \- io.netty:netty-resolver:jar:4.1.42.Final:compile
> [INFO] |  |  +- io.netty:netty-codec:jar:4.1.42.Final:compile
> [INFO] |  |  +- io.netty:netty-handler:jar:4.1.42.Final:compile
> [INFO] |  |  \- io.netty:netty-codec-http:jar:4.1.42.Final:compile
> [INFO] |  \- io.netty:netty-handler-proxy:jar:4.1.42.Final:compile
> [INFO] | \- io.netty:netty-codec-socks:jar:4.1.42.Final:compile
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154189#comment-17154189
 ] 

Masatake Iwasaki commented on YARN-10347:
-

Since this was brought by the backported patch of YARN-10022, only branch-3.1 
and branch-3.2 are affected. I'm going to cherry-pick this to branch-3.2.

The backported patch added duplicated call of  
[ReentrantReadWriteLock.WriteLock#lock|https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.WriteLock.html#lock--].
 The {{lock()}} increases hold count by one. If it is called twice, the lock 
will be held until {{unlock()}} is called twice.


> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Brahma Reddy Battula (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154183#comment-17154183
 ] 

Brahma Reddy Battula commented on YARN-10347:
-

[~iwasakims], thanks for reporting this.. Same will applicable to trunnk and 
other versions..?

and can give more details on this..?

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-08 Thread Brahma Reddy Battula (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154155#comment-17154155
 ] 

Brahma Reddy Battula commented on YARN-10324:
-

Updated the target version to 3.3.1 as 3.3.0 is going to release.

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-08 Thread Brahma Reddy Battula (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-10324:

Target Version/s: 2.7.8, 3.3.1  (was: 2.7.8, 3.3.0)

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154105#comment-17154105
 ] 

Hadoop QA commented on YARN-10348:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
34s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
21m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
49s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 49s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
58s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m  
1s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m 58s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}208m 55s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.o

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154081#comment-17154081
 ] 

Masatake Iwasaki commented on YARN-10347:
-

committed to branch-3.1.

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154075#comment-17154075
 ] 

Masatake Iwasaki commented on YARN-10347:
-

I got no relevant failure.
{noformat}
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestApplicationMasterService.testUpdateTrackingUrl:984 
expected:<[hadoop.apache.org]> but was:<[N/A]>
[INFO] 
[ERROR] Tests run: 2453, Failures: 1, Errors: 0, Skipped: 8
{noformat}

The TestApplicationMasterService.testUpdateTrackingUrl looks flaky one. I can 
not reproduce the failure by rerunning the test.
{noformat}
$ mvn test -Dtest=TestApplicationMasterService
...
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
[INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.082 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterService
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 15, Failures: 0, Errors: 0, Skipped: 0
{noformat}


> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154071#comment-17154071
 ] 

Masatake Iwasaki commented on YARN-10347:
-

Thanks, [~ayushtkn]. I'm running {{mvn test}} in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 on my local. I will commit the patch after checking the result.

Since I can reproduce the docker failure by running ./start-build-env.sh, I 
filed HADOOP-17120.

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154062#comment-17154062
 ] 

Ayush Saxena commented on YARN-10347:
-

hmm, need to see what triggered this at jenkins.

Anyway the fix seems pretty straight forward. The original trunk patch didn't 
had it in YARN-10022, got added during backport.

+1, Thanx [~iwasakims] for the find. This should ideally unblock 3.1.4 release 
as well?

cc. [~gabor.bota]

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154048#comment-17154048
 ] 

Jim Brennan commented on YARN-10348:


I have verified that TestDelegationTokenRenewer.testTokenThreadTimeout fails 
intermittently if I run it in a loop with or without my change.


> Allow RM to always cancel tokens after app completes
> 
>
> Key: YARN-10348
> URL: https://issues.apache.org/jira/browse/YARN-10348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0, 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10348.001.patch, YARN-10348.002.patch
>
>
> (Note: this change was originally done on our internal branch by [~daryn]).
> The RM currently has an option for a client to specify disabling token 
> cancellation when a job completes. This feature was an initial attempt to 
> address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
> original job finishing prior to the sub-job(s) completion - ex. original job 
> completion triggered premature cancellation of tokens needed by the sub-jobs.
> Many years ago, [~daryn] added a more robust implementation to ref count 
> tokens ([YARN-3055]). This prevented premature cancellation of the token 
> until all apps using the token complete, and invalidated the need for a 
> client to specify cancel=false. Unfortunately the config option was not 
> removed.
> We have seen cases where oozie "java actions" and some users were explicitly 
> disabling token cancellation. This can lead to a buildup of defunct tokens 
> that may overwhelm the ZK buffer used by the KDC's backing store. At which 
> point the KMS fails to connect to ZK and is unable to issue/validate new 
> tokens - rendering the KDC only able to authenticate pre-existing tokens. 
> Production incidents have occurred due to the buffer size issue.
> To avoid these issues, the RM should have the option to ignore/override the 
> client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10347:

Description: 
Double locking blocks another threads in ResourceManager waiting for the lock.

I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
ResourceManager blocks on {{submitApplication}} waiting for the lock when I run 
example MR applications.

{noformat}
"IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x85d56510> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
{noformat}

 

  was:Double locking blocks another threads in ResourceManager waiting for the 
lock.


> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154034#comment-17154034
 ] 

Masatake Iwasaki commented on YARN-10347:
-

Yes. The QA build failed on docker image creation  installing python packages. 
It seems not to be related to the patch.
{noformat}
Step 26/32 : RUN pip2 install configparser==4.0.2 pylint==1.9.2
...
The command '/bin/sh -c pip2 install configparser==4.0.2 pylint==1.9.2' 
returned a non-zero code: 1
ERROR: Docker failed to build yetus/hadoop:d84386ccf7a.
{noformat}

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154011#comment-17154011
 ] 

Jim Brennan commented on YARN-10348:


patch 002 addresses the checkstyle issues and adds an entry to yarn-default.xml 
to fix TestYarnConfigurationFields.

I have not been able to repro the 
TestDelegationTokenRenewer.testTokenThreadTimeout() failure.  It looks like the 
same timeout reported in [YARN-10155].  It appears to still fail 
intermittently.  I don't think it is related to this patch.

TestCapacityOverTimePolicy and TestFairSchedulerPreemption failures are 
unrelated to this change.

 

> Allow RM to always cancel tokens after app completes
> 
>
> Key: YARN-10348
> URL: https://issues.apache.org/jira/browse/YARN-10348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0, 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10348.001.patch, YARN-10348.002.patch
>
>
> (Note: this change was originally done on our internal branch by [~daryn]).
> The RM currently has an option for a client to specify disabling token 
> cancellation when a job completes. This feature was an initial attempt to 
> address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
> original job finishing prior to the sub-job(s) completion - ex. original job 
> completion triggered premature cancellation of tokens needed by the sub-jobs.
> Many years ago, [~daryn] added a more robust implementation to ref count 
> tokens ([YARN-3055]). This prevented premature cancellation of the token 
> until all apps using the token complete, and invalidated the need for a 
> client to specify cancel=false. Unfortunately the config option was not 
> removed.
> We have seen cases where oozie "java actions" and some users were explicitly 
> disabling token cancellation. This can lead to a buildup of defunct tokens 
> that may overwhelm the ZK buffer used by the KDC's backing store. At which 
> point the KMS fails to connect to ZK and is unable to issue/validate new 
> tokens - rendering the KDC only able to authenticate pre-existing tokens. 
> Production incidents have occurred due to the buffer size issue.
> To avoid these issues, the RM should have the option to ignore/override the 
> client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154009#comment-17154009
 ] 

Ayush Saxena edited comment on YARN-10347 at 7/8/20, 8:59 PM:
--

Thanx [~iwasakims]  for the fix.

Changes seems fair enough, but the jenkins doesn't seems behaving as expected. 
Any idea?

 


was (Author: ayushtkn):
Thanx [~iwasakims]  for the fix.

Changes seems fair enough, but the jenkins doesn't seems behaving expected. Any 
idea?

 

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154009#comment-17154009
 ] 

Ayush Saxena commented on YARN-10347:
-

Thanx [~iwasakims]  for the fix.

Changes seems fair enough, but the jenkins doesn't seems behaving expected. Any 
idea?

 

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Jim Brennan (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10348:
---
Attachment: YARN-10348.002.patch

> Allow RM to always cancel tokens after app completes
> 
>
> Key: YARN-10348
> URL: https://issues.apache.org/jira/browse/YARN-10348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0, 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10348.001.patch, YARN-10348.002.patch
>
>
> (Note: this change was originally done on our internal branch by [~daryn]).
> The RM currently has an option for a client to specify disabling token 
> cancellation when a job completes. This feature was an initial attempt to 
> address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
> original job finishing prior to the sub-job(s) completion - ex. original job 
> completion triggered premature cancellation of tokens needed by the sub-jobs.
> Many years ago, [~daryn] added a more robust implementation to ref count 
> tokens ([YARN-3055]). This prevented premature cancellation of the token 
> until all apps using the token complete, and invalidated the need for a 
> client to specify cancel=false. Unfortunately the config option was not 
> removed.
> We have seen cases where oozie "java actions" and some users were explicitly 
> disabling token cancellation. This can lead to a buildup of defunct tokens 
> that may overwhelm the ZK buffer used by the KDC's backing store. At which 
> point the KMS fails to connect to ZK and is unable to issue/validate new 
> tokens - rendering the KDC only able to authenticate pre-existing tokens. 
> Production incidents have occurred due to the buffer size issue.
> To avoid these issues, the RM should have the option to ignore/override the 
> client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154001#comment-17154001
 ] 

Hadoop QA commented on YARN-10347:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
32s{color} | {color:red} Docker failed to build yetus/hadoop:d84386ccf7a. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10347 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007309/YARN-10347-branch-3.1.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/26260/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153974#comment-17153974
 ] 

Hadoop QA commented on YARN-10348:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  2m  
7s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
12s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 35s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 2 new + 301 unchanged - 0 fixed = 303 total (was 301) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 48s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
56s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 58s{color} 
| {color:red} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 55s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}202m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
|   | hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy 
|
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.

[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153754#comment-17153754
 ] 

Jim Brennan commented on YARN-10348:


patch 001 adds a new YARN configuration property:
{noformat}
public static final String RM_DELEGATION_TOKEN_ALWAYS_CANCEL =
RM_PREFIX + "delegation-token.always-cancel";
public static final boolean DEFAULT_RM_DELEGATION_TOKEN_ALWAYS_CANCEL = false; 
{noformat}
Internally we default this to true, but to maintain compatibility I've set it 
to false in this patch.

If this property is true, we effectively ignore the {{shouldCancelAtEnd}} 
parameter that came from the client.
We have been running with this change in production internally for about two 
years.

> Allow RM to always cancel tokens after app completes
> 
>
> Key: YARN-10348
> URL: https://issues.apache.org/jira/browse/YARN-10348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0, 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10348.001.patch
>
>
> (Note: this change was originally done on our internal branch by [~daryn]).
> The RM currently has an option for a client to specify disabling token 
> cancellation when a job completes. This feature was an initial attempt to 
> address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
> original job finishing prior to the sub-job(s) completion - ex. original job 
> completion triggered premature cancellation of tokens needed by the sub-jobs.
> Many years ago, [~daryn] added a more robust implementation to ref count 
> tokens ([YARN-3055]). This prevented premature cancellation of the token 
> until all apps using the token complete, and invalidated the need for a 
> client to specify cancel=false. Unfortunately the config option was not 
> removed.
> We have seen cases where oozie "java actions" and some users were explicitly 
> disabling token cancellation. This can lead to a buildup of defunct tokens 
> that may overwhelm the ZK buffer used by the KDC's backing store. At which 
> point the KMS fails to connect to ZK and is unable to issue/validate new 
> tokens - rendering the KDC only able to authenticate pre-existing tokens. 
> Production incidents have occurred due to the buffer size issue.
> To avoid these issues, the RM should have the option to ignore/override the 
> client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-08 Thread Jim Brennan (Jira)

Jim Brennan created YARN-10348:
--

 Summary: Allow RM to always cancel tokens after app completes
 Key: YARN-10348
 URL: https://issues.apache.org/jira/browse/YARN-10348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.1.3, 2.10.0
Reporter: Jim Brennan
Assignee: Jim Brennan


(Note: this change was originally done on our internal branch by [~daryn]).

The RM currently has an option for a client to specify disabling token 
cancellation when a job completes. This feature was an initial attempt to 
address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
original job finishing prior to the sub-job(s) completion - ex. original job 
completion triggered premature cancellation of tokens needed by the sub-jobs.

Many years ago, [~daryn] added a more robust implementation to ref count tokens 
([YARN-3055]). This prevented premature cancellation of the token until all 
apps using the token complete, and invalidated the need for a client to specify 
cancel=false. Unfortunately the config option was not removed.

We have seen cases where oozie "java actions" and some users were explicitly 
disabling token cancellation. This can lead to a buildup of defunct tokens that 
may overwhelm the ZK buffer used by the KDC's backing store. At which point the 
KMS fails to connect to ZK and is unable to issue/validate new tokens - 
rendering the KDC only able to authenticate pre-existing tokens. Production 
incidents have occurred due to the buffer size issue.

To avoid these issues, the RM should have the option to ignore/override the 
client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153714#comment-17153714
 ] 

Hadoop QA commented on YARN-10347:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 14m  
5s{color} | {color:red} Docker failed to build yetus/hadoop:d84386ccf7a. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-10347 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007309/YARN-10347-branch-3.1.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/26258/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153700#comment-17153700
 ] 

Hadoop QA commented on YARN-10341:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
57s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 2 new + 15 unchanged - 0 fixed = 17 total (was 15) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 28s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
37s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 84m  7s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26257/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10341 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007305/YARN-10341.003.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux eb15ff0a25e6 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 3a4d05b8504 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/26257/artifact/out/diff-checkstyle

[jira] [Updated] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-10347:

Attachment: YARN-10347-branch-3.1.001.patch

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-08 Thread Masatake Iwasaki (Jira)

Masatake Iwasaki created YARN-10347:
---

 Summary: Fix double locking in CapacityScheduler#reinitialize in 
branch-3.1
 Key: YARN-10347
 URL: https://issues.apache.org/jira/browse/YARN-10347
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.1.4
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


Double locking blocks another threads in ResourceManager waiting for the lock.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Bilwa S T (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10341:
-
Attachment: YARN-10341.003.patch

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153590#comment-17153590
 ] 

Hadoop QA commented on YARN-10341:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
55s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core:
 The patch generated 4 new + 15 unchanged - 0 fixed = 19 total (was 15) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 25s{color} 
| {color:red} hadoop-yarn-services-core in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.service.TestServiceAM |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26256/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10341 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007298/YARN-10341.002.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 13923a5cf4ff 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 3a4d05b8504 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | 
https://b

[jira] [Updated] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Bilwa S T (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10341:
-
Attachment: YARN-10341.002.patch

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-08 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153525#comment-17153525
 ] 

Bilwa S T commented on YARN-10341:
--

Thanks [~brahmareddy] [~billie] for reviewing.
I have added testcase in patch .002 

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-07-08 Thread Bilwa S T (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153466#comment-17153466
 ] 

Bilwa S T commented on YARN-8047:
-

Thanks [~prabhujoseph] . Raised YARN-10346 for adding testcase.

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch, YARN-8047.004.patch, YARN-8047.005.patch, 
> YARN-8047.006.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10346) Add testcase for RMWebApp make external class pluggable

2020-07-08 Thread Bilwa S T (Jira)

Bilwa S T created YARN-10346:


 Summary: Add testcase for RMWebApp make external class pluggable
 Key: YARN-10346
 URL: https://issues.apache.org/jira/browse/YARN-10346
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Add testcase for Jira YARN-8047



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10333) YarnClient obtain Delegation Token for Log Aggregation Path

2020-07-08 Thread Zhankun Tang (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153405#comment-17153405
 ] 

Zhankun Tang commented on YARN-10333:
-

It LGTM. +1. Thanks for your contribution! [~prabhujoseph], [~sunilg]

> YarnClient obtain Delegation Token for Log Aggregation Path
> ---
>
> Key: YARN-10333
> URL: https://issues.apache.org/jira/browse/YARN-10333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10333-001.patch, YARN-10333-002.patch, 
> YARN-10333-003.patch
>
>
> There are use cases where Yarn Log Aggregation Path is configured to a 
> FileSystem like S3 or ABFS different from what is configured in fs.defaultFS 
> (HDFS). Log Aggregation fails as the client has token only for fs.defaultFS 
> and not for log aggregation path.
> This Jira is to improve YarnClient by obtaining delegation token for log 
> aggregation path and add it to the Credential of Container Launch Context 
> similar to how it does for Timeline Delegation Token.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-10333) YarnClient obtain Delegation Token for Log Aggregation Path

2020-07-08 Thread Sunil G (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153326#comment-17153326
 ] 

Sunil G commented on YARN-10333:


This change looks fine to me.

cc [~ztang] [~bibinchundatt] [~rohithsharmaks] thoughts?

> YarnClient obtain Delegation Token for Log Aggregation Path
> ---
>
> Key: YARN-10333
> URL: https://issues.apache.org/jira/browse/YARN-10333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-10333-001.patch, YARN-10333-002.patch, 
> YARN-10333-003.patch
>
>
> There are use cases where Yarn Log Aggregation Path is configured to a 
> FileSystem like S3 or ABFS different from what is configured in fs.defaultFS 
> (HDFS). Log Aggregation fails as the client has token only for fs.defaultFS 
> and not for log aggregation path.
> This Jira is to improve YarnClient by obtaining delegation token for log 
> aggregation path and add it to the Credential of Container Launch Context 
> similar to how it does for Timeline Delegation Token.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10340) HsWebServices getContainerReport uses loginUser instead of remoteUser to access ApplicationClientProtocol

2020-07-08 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10340:
-
Parent: YARN-10025
Issue Type: Sub-task  (was: Bug)

> HsWebServices getContainerReport uses loginUser instead of remoteUser to 
> access ApplicationClientProtocol
> -
>
> Key: YARN-10340
> URL: https://issues.apache.org/jira/browse/YARN-10340
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Prabhu Joseph
>Assignee: Tarun Parimi
>Priority: Major
>
> HsWebServices getContainerReport uses loginUser instead of remoteUser to 
> access ApplicationClientProtocol
>  
> [http://:19888/ws/v1/history/containers/container_e03_1594030808801_0002_01_03/logs|http://pjoseph-secure-1.pjoseph-secure.root.hwx.site:19888/ws/v1/history/containers/container_e03_1594030808801_0002_01_03/logs]
> While accessing above link using systest user, the request fails saying 
> mapred user does not have access to the job
>  
> {code:java}
> 2020-07-06 14:02:59,178 WARN org.apache.hadoop.yarn.server.webapp.LogServlet: 
> Could not obtain node HTTP address from provider.
> javax.ws.rs.WebApplicationException: 
> org.apache.hadoop.yarn.exceptions.YarnException: User mapred does not have 
> privilege to see this application application_1593997842459_0214
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getContainerReport(ClientRMService.java:516)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getContainerReport(ApplicationClientProtocolPBServiceImpl.java:466)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:639)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowThrowable(WebServices.java:544)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.rewrapAndThrowException(WebServices.java:530)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.getContainer(WebServices.java:405)
> at 
> org.apache.hadoop.yarn.server.webapp.WebServices.getNodeHttpAddress(WebServices.java:373)
> at 
> org.apache.hadoop.yarn.server.webapp.LogServlet.getContainerLogsInfo(LogServlet.java:268)
> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.HsWebServices.getContainerLogs(HsWebServices.java:461)
>  
> {code}
> On Analyzing, found WebServices#getContainer uses doAs using UGI created by 
> createRemoteUser(end user) to access RM#ApplicationClientProtocol which does 
> not work. Need to use createProxyUser to do the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs

2020-07-08 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10345:
-
Parent: YARN-10025
Issue Type: Sub-task  (was: Bug)

> HsWebServices containerlogs does not honor ACLs for completed jobs
> --
>
> Key: YARN-10345
> URL: https://issues.apache.org/jira/browse/YARN-10345
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.3.0, 3.2.2, 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png
>
>
> HsWebServices containerlogs does not honor ACLs. User who does not have 
> permission to view a job is allowed to view the job logs for completed jobs 
> from YARN UI2 through HsWebServices.
> *Repro:*
> Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
> HistoryServer runs as mapred
>  # Run a sample MR job using systest user
>  #  Once the job is complete, access the job logs using hue user from YARN 
> UI2.
> !Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!
>  
> YARN CLI works fine and does not allow hue user to view systest user job logs.
> {code:java}
> [hue@pjoseph-cm-2 /]$ 
> [hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
> rmhostname:8032
> Permission denied: user=hue, access=EXECUTE, 
> inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs

2020-07-08 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10345:
-
Affects Version/s: (was: 3.2.0)
   3.2.2
   3.3.0

> HsWebServices containerlogs does not honor ACLs for completed jobs
> --
>
> Key: YARN-10345
> URL: https://issues.apache.org/jira/browse/YARN-10345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.0, 3.2.2, 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png
>
>
> HsWebServices containerlogs does not honor ACLs. User who does not have 
> permission to view a job is allowed to view the job logs for completed jobs 
> from YARN UI2 through HsWebServices.
> *Repro:*
> Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
> HistoryServer runs as mapred
>  # Run a sample MR job using systest user
>  #  Once the job is complete, access the job logs using hue user from YARN 
> UI2.
> !Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!
>  
> YARN CLI works fine and does not allow hue user to view systest user job logs.
> {code:java}
> [hue@pjoseph-cm-2 /]$ 
> [hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
> rmhostname:8032
> Permission denied: user=hue, access=EXECUTE, 
> inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs

2020-07-08 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10345:
-
Description: 
HsWebServices containerlogs does not honor ACLs. User who does not have 
permission to view a job is allowed to view the job logs for completed jobs 
from YARN UI2 through HsWebServices.

*Repro:*

Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
HistoryServer runs as mapred
 # Run a sample MR job using systest user
 #  Once the job is complete, access the job logs using hue user from YARN UI2.

!Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!

 

YARN CLI works fine and does not allow hue user to view systest user job logs.
{code:java}
[hue@pjoseph-cm-2 /]$ 
[hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
rmhostname:8032
Permission denied: user=hue, access=EXECUTE, 
inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
{code}

  was:
HsWebServices containerlogs does not honor ACLs. User who does not have 
permission to view a job is allowed to view the job logs from YARN UI2 through 
HsWebServices.

*Repro:*

Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
HistoryServer runs as mapred
 # Run a sample MR job using systest user
 #  Once the job is complete, access the job logs using hue user from YARN UI2.

!Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!

 

YARN CLI works fine and does not allow hue user to view systest user job logs.
{code:java}
[hue@pjoseph-cm-2 /]$ 
[hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
rmhostname:8032
Permission denied: user=hue, access=EXECUTE, 
inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
{code}


> HsWebServices containerlogs does not honor ACLs for completed jobs
> --
>
> Key: YARN-10345
> URL: https://issues.apache.org/jira/browse/YARN-10345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0, 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png
>
>
> HsWebServices containerlogs does not honor ACLs. User who does not have 
> permission to view a job is allowed to view the job logs for completed jobs 
> from YARN UI2 through HsWebServices.
> *Repro:*
> Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
> HistoryServer runs as mapred
>  # Run a sample MR job using systest user
>  #  Once the job is complete, access the job logs using hue user from YARN 
> UI2.
> !Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!
>  
> YARN CLI works fine and does not allow hue user to view systest user job logs.
> {code:java}
> [hue@pjoseph-cm-2 /]$ 
> [hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
> rmhostname:8032
> Permission denied: user=hue, access=EXECUTE, 
> inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs

2020-07-08 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10345:
-
Description: 
HsWebServices containerlogs does not honor ACLs. User who does not have 
permission to view a job is allowed to view the job logs from YARN UI2 through 
HsWebServices.

*Repro:*

Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
HistoryServer runs as mapred
 # Run a sample MR job using systest user
 #  Once the job is complete, access the job logs using hue user from YARN UI2.

!Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!

 

YARN CLI works fine and does not allow hue user to view systest user job logs.
{code:java}
[hue@pjoseph-cm-2 /]$ 
[hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
rmhostname:8032
Permission denied: user=hue, access=EXECUTE, 
inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
{code}

  was:
HsWebServices containerlogs does not honor ACLs. User who does not have 
permission to view a job is allowed to view the job logs from YARN UI2 through 
HsWebServices.

*Repro:*

Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
HistoryServer runs as mapred

1. Run a sample MR job using systest user
2. Once the job is complete, access the job logs using hue user from YARN UI2. 




YARN CLI works fine.
{code}
[hue@pjoseph-cm-2 /]$ 
[hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
rmhostname:8032
Permission denied: user=hue, access=EXECUTE, 
inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
{code}




> HsWebServices containerlogs does not honor ACLs for completed jobs
> --
>
> Key: YARN-10345
> URL: https://issues.apache.org/jira/browse/YARN-10345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0, 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png
>
>
> HsWebServices containerlogs does not honor ACLs. User who does not have 
> permission to view a job is allowed to view the job logs from YARN UI2 
> through HsWebServices.
> *Repro:*
> Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
> HistoryServer runs as mapred
>  # Run a sample MR job using systest user
>  #  Once the job is complete, access the job logs using hue user from YARN 
> UI2.
> !Screen Shot 2020-07-08 at 12.54.21 PM.png|height=300!
>  
> YARN CLI works fine and does not allow hue user to view systest user job logs.
> {code:java}
> [hue@pjoseph-cm-2 /]$ 
> [hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
> rmhostname:8032
> Permission denied: user=hue, access=EXECUTE, 
> inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs

2020-07-08 Thread Prabhu Joseph (Jira)

Prabhu Joseph created YARN-10345:


 Summary: HsWebServices containerlogs does not honor ACLs for 
completed jobs
 Key: YARN-10345
 URL: https://issues.apache.org/jira/browse/YARN-10345
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.2.0, 3.4.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph
 Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png

HsWebServices containerlogs does not honor ACLs. User who does not have 
permission to view a job is allowed to view the job logs from YARN UI2 through 
HsWebServices.

*Repro:*

Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
HistoryServer runs as mapred

1. Run a sample MR job using systest user
2. Once the job is complete, access the job logs using hue user from YARN UI2. 




YARN CLI works fine.
{code}
[hue@pjoseph-cm-2 /]$ 
[hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
rmhostname:8032
Permission denied: user=hue, access=EXECUTE, 
inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-10345) HsWebServices containerlogs does not honor ACLs for completed jobs

2020-07-08 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-10345:
-
Attachment: Screen Shot 2020-07-08 at 12.54.21 PM.png

> HsWebServices containerlogs does not honor ACLs for completed jobs
> --
>
> Key: YARN-10345
> URL: https://issues.apache.org/jira/browse/YARN-10345
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.2.0, 3.4.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Attachments: Screen Shot 2020-07-08 at 12.54.21 PM.png
>
>
> HsWebServices containerlogs does not honor ACLs. User who does not have 
> permission to view a job is allowed to view the job logs from YARN UI2 
> through HsWebServices.
> *Repro:*
> Secure cluster + yarn.admin.acl=yarn,mapred + Root Queue ACLs set to " " + 
> HistoryServer runs as mapred
> 1. Run a sample MR job using systest user
> 2. Once the job is complete, access the job logs using hue user from YARN 
> UI2. 
> YARN CLI works fine.
> {code}
> [hue@pjoseph-cm-2 /]$ 
> [hue@pjoseph-cm-2 /]$ yarn logs -applicationId application_1594188841761_0002
> WARNING: YARN_OPTS has been replaced by HADOOP_OPTS. Using value of YARN_OPTS.
> 20/07/08 07:23:08 INFO client.RMProxy: Connecting to ResourceManager at 
> rmhostname:8032
> Permission denied: user=hue, access=EXECUTE, 
> inode="/tmp/logs/systest":systest:hadoop:drwxrwx---
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-07-08 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153315#comment-17153315
 ] 

Hudson commented on YARN-8047:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18418 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18418/])
YARN-8047. RMWebApp make external class pluggable. (pjoseph: rev 
3a4d05b850449c51a13f3a15fe0d756fdf50b4b2)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RmController.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml


> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch, YARN-8047.004.patch, YARN-8047.005.patch, 
> YARN-8047.006.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8047) RMWebApp make external class pluggable

2020-07-08 Thread Prabhu Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/YARN-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153309#comment-17153309
 ] 

Prabhu Joseph commented on YARN-8047:
-

Thanks [~BilwaST] for the patch.

Have committed the latest patch  [^YARN-8047.006.patch]  to trunk. Can you 
report a separate Jira to handle the testcase.

> RMWebApp make external class pluggable
> --
>
> Key: YARN-8047
> URL: https://issues.apache.org/jira/browse/YARN-8047
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: YARN-8047-001.patch, YARN-8047-002.patch, 
> YARN-8047-003.patch, YARN-8047.004.patch, YARN-8047.005.patch, 
> YARN-8047.006.patch
>
>
> JIra should make sure we should be able to plugin webservices and web pages 
> of scheduler in Resourcemanager
> * RMWebApp allow to bind external classes
> * RMController allow to plugin scheduler classes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

47 matches

Mail list logo