[jira] [Commented] (YARN-10329) Flaky test cases in Fair Scheduler

2020-07-09 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154945#comment-17154945
 ] 

Jim Brennan commented on YARN-10329:


I included a possible fix for the TestFairScheduler failure in [YARN-10297].  
It is the same failure as the one that was being addressed in that ticket.


> Flaky test cases in Fair Scheduler
> --
>
> Key: YARN-10329
> URL: https://issues.apache.org/jira/browse/YARN-10329
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Hudáky Márton Gyula
>Priority: Minor
>
> The following 2 test cases are failing on unrelated patches very often:
> hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption
> Here is an example of both failures
> {code:java}
> [ERROR] Tests run: 105, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 27.481 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
> [ERROR] 
> testNormalizationUsingQueueMaximumAllocation(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler)
>   Time elapsed: 0.178 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:360)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:599)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:399)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:331)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:358)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:194)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:462)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:931)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.allocateAppAttempt(TestFairScheduler.java:435)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler.testNormalizationUsingQueueMaximumAllocation(TestFairScheduler.java:409)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)

[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource

2020-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154944#comment-17154944
 ] 

Hadoop QA commented on YARN-4575:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 12s{color} 
| {color:red} YARN-4575 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-4575 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12782086/0002-YARN-4575.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/26266/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154942#comment-17154942
 ] 

Eric Payne commented on YARN-10297:
---

bq. Would there be any objection to me including the TestFairScheduler fix in 
this patch and noting that in YARN-10329? Or would it be better to file a new 
Jira?
[~Jim_Brennan]
Generally I like to keep issues in separate JIRAs, but since these are related, 
the fixes are in the unit tests, and they are small, I think it's fine to 
combine them in this case.

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch, 
> YARN-10297.003.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4575) ApplicationResourceUsageReport should return ALL reserved resource

2020-07-09 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154938#comment-17154938
 ] 

Eric Payne commented on YARN-4575:
--

[~bibinchundatt], [~sunilg], [~Naganarasimha], [~rohithsharma],

I fear that this has gone stale, but I believe it should be upmerged and 
committed. Do you have any objections if I work on that?

> ApplicationResourceUsageReport should return ALL  reserved resource
> ---
>
> Key: YARN-4575
> URL: https://issues.apache.org/jira/browse/YARN-4575
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin Chundatt
>Priority: Major
>  Labels: oct16-easy
> Attachments: 0001-YARN-4575.patch, 0002-YARN-4575.patch
>
>
> ApplicationResourceUsageReport reserved resource report  is only of default 
> parition should be of all partitions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154892#comment-17154892
 ] 

Hadoop QA commented on YARN-10297:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
50s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 92m  5s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}160m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26265/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10297 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007390/YARN-10297.003.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux c111f1ad112e 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 5dd270e2085 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| whitespace | 

[jira] [Commented] (YARN-10268) TestFederationInterceptor#testRecoverWithoutAMRMProxyHA fails intermittently.

2020-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154762#comment-17154762
 ] 

Hadoop QA commented on YARN-10268:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
39s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
37s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 51s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 33s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 87m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26264/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10268 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007387/YARN-10268.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux f5f5212b82bd 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 5dd270e2085 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| unit | 

[jira] [Commented] (YARN-8631) YARN RM fails to add the application to the delegation token renewer on recovery

2020-07-09 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154737#comment-17154737
 ] 

Andras Gyori commented on YARN-8631:


[~umittal] I was trying to reproduce the issue, however, the attached patch is 
already outdated. Could you give a more up-to-date example please?

> YARN RM fails to add the application to the delegation token renewer on 
> recovery
> 
>
> Key: YARN-8631
> URL: https://issues.apache.org/jira/browse/YARN-8631
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.1.0
>Reporter: Sanjay Divgi
>Assignee: Umesh Mittal
>Priority: Blocker
> Attachments: YARN-8631.001.patch, 
> hadoop-yarn-resourcemanager-ctr-e138-1518143905142-429059-01-04.log
>
>
> On HA cluster we have observed that yarn resource manager fails to add the 
> application to the delegation token renewer on recovery.
> Below is the error:
> {code:java}
> 2018-08-07 08:41:23,850 INFO security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:renewToken(635)) - Renewed delegation-token= 
> [Kind: TIMELINE_DELEGATION_TOKEN, Service: 172.27.84.192:8188, Ident: 
> (TIMELINE_DELEGATION_TOKEN owner=hrt_qa_hive_spark, renewer=yarn, realUser=, 
> issueDate=1533624642302, maxDate=1534229442302, sequenceNumber=18, 
> masterKeyId=4);exp=1533717683478; apps=[application_1533623972681_0001]]
> 2018-08-07 08:41:23,855 WARN security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to 
> add the application to the delegation token renewer on recovery.
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:522)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-09 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154732#comment-17154732
 ] 

Jim Brennan commented on YARN-10297:


I put up patch 003 which includes the fix for TestFairScheduler.  If we decide 
to do that separately, we can go with patch 002.


> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch, 
> YARN-10297.003.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-09 Thread Jim Brennan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated YARN-10297:
---
Attachment: YARN-10297.003.patch

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch, 
> YARN-10297.003.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10342) [UI1] Provide a way to hide Tools section in Web UIv1

2020-07-09 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10342:

Attachment: Screenshot 2020-07-09 at 14.13.19.png

> [UI1] Provide a way to hide Tools section in Web UIv1
> -
>
> Key: YARN-10342
> URL: https://issues.apache.org/jira/browse/YARN-10342
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: Screenshot 2020-07-09 at 14.13.19.png, 
> YARN-10342.001.patch
>
>
> The Tools section in web UI1 might contain sensitive information, which 
> should ideally be hidden from end users. We should provide a configurable 
> value to hide it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10297) TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails intermittently

2020-07-09 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154720#comment-17154720
 ] 

Jim Brennan commented on YARN-10297:


As reported in [YARN-10329], TestFairScheduler is failing in the same way.  
That Jira actually covers two different unit tests, TestFairScheduler and 
TestFairSchedulerPreemption, and the failures do not look related to each other.

I have tested the fix from this patch in TestFairScheduler, and it seems to fix 
it.  Would there be any objection to me including the TestFairScheduler fix in 
this patch and noting that in [YARN-10329]?
Or would it be better to file a new Jira?

cc: [~epayne], [~jhung], [~maniraj...@gmail.com]

> TestContinuousScheduling#testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently
> ---
>
> Key: YARN-10297
> URL: https://issues.apache.org/jira/browse/YARN-10297
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jonathan Hung
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10297.001.patch, YARN-10297.002.patch
>
>
> After YARN-6492, testFairSchedulerContinuousSchedulingInitTime fails 
> intermittently when running {{mvn test -Dtest=TestContinuousScheduling}}
> {noformat}[INFO] Running 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 6.682 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling
> [ERROR] 
> testFairSchedulerContinuousSchedulingInitTime(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling)
>   Time elapsed: 0.194 s  <<< ERROR!
> org.apache.hadoop.metrics2.MetricsException: Metrics source 
> PartitionQueueMetrics,partition= already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.getPartitionMetrics(QueueMetrics.java:362)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.QueueMetrics.incrPendingResources(QueueMetrics.java:601)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updatePendingResources(AppSchedulingInfo.java:388)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:320)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.internalAddResourceRequests(AppSchedulingInfo.java:347)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.updateResourceRequests(AppSchedulingInfo.java:183)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt.updateResourceRequests(SchedulerApplicationAttempt.java:456)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:898)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestContinuousScheduling.testFairSchedulerContinuousSchedulingInitTime(TestContinuousScheduling.java:375)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10268) TestFederationInterceptor#testRecoverWithoutAMRMProxyHA fails intermittently.

2020-07-09 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated YARN-10268:
-
Attachment: YARN-10268.001.patch

> TestFederationInterceptor#testRecoverWithoutAMRMProxyHA fails intermittently.
> -
>
> Key: YARN-10268
> URL: https://issues.apache.org/jira/browse/YARN-10268
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10268.001.patch
>
>
> [ERROR] 
> testRecoverWithoutAMRMProxyHA(org.apache.hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor)
>   Time elapsed: 0.029 s  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor$3.run(TestFederationInterceptor.java:580)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor.testRecover(TestFederationInterceptor.java:520)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.TestFederationInterceptor.testRecoverWithoutAMRMProxyHA(TestFederationInterceptor.java:513)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9371) Better logging for initialization of cgroups in CGroupsHandlerImpl

2020-07-09 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned YARN-9371:
---

Assignee: (was: Bilwa S T)

> Better logging for initialization of cgroups in CGroupsHandlerImpl
> --
>
> Key: YARN-9371
> URL: https://issues.apache.org/jira/browse/YARN-9371
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Priority: Major
>
> We already have YARN-9098 that dealt with (not yet merged, though) code 
> duplication in cgroups setup code.
> However, there's still room for improvement in the area of cgroups 
> initialization, especially in CGroupsHandlerImpl.
> 1. 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl#initializeControllerPaths:
>  There's a decision in this method: Either cgroups are parsed from the 
> configured mount path or from the mounted cgroups descriptor file 
> (/proc/mounts). This decisions needs to be logged, along with some details, 
> e.g. cgroups mount path.
> 2. In ResourceHandlerModule#parseConfiguredCGroupPath: Every candidate file 
> could be logged to debug. Moreover, the returned pathSubsystemMappings could 
> be also logged to info.
> 3. In CGroupsHandlerImpl#parseMtab: This method iterates over lines of the 
> mtab file, tries to match the line and parses data of cgroups. 
> We need to better have logs for: 
> - If the line does not match: Print the line (I think debug level is fine for 
> that)
> - We could log the entire result map to info log.
> 4. In CGroupsHandlerImpl#initializeControllerPaths, 
> initializeControllerPathsFromMtab is called. This method returns mapping of 
> cgroups controllers (cpu, memory, devices, etc.) to file system paths. This 
> is an important piece of information for troubleshooting, so the map needs to 
> be logged to info.
> 5. In CGroupsHandlerImpl#initializeCGroupController: There's a condition for 
> enableCGroupMount, I think it's worth to log if we are dealing with 
> YARN-driven cgroups mount or pre-mounted controllers.
> 6. In CGroupsHandlerImpl#mountCGroupController: 
> - existingMountPath needs to be logged for better troubleshooting
> - Other debug logs could also be added, this is up to the owner of this jira.
> 7. In CGroupsHandlerImpl#initializePreMountedCGroupController: 
> - controllerPath needs to be logged for better troubleshooting.
> - Other debug logs could also be added, this is up to the owner of this jira.
>  
> A side-note: Implementing this Jira could be way easier after YARN-9098 is 
> merged as that Jira eliminated lots of code duplication and made the 
> initialization of cgroups more clear and concise.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10348) Allow RM to always cancel tokens after app completes

2020-07-09 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154647#comment-17154647
 ] 

Jim Brennan commented on YARN-10348:


The TestFairSchedulerPreemption unit test failure is unrelated to this Jira.
I believe this is ready for review.


> Allow RM to always cancel tokens after app completes
> 
>
> Key: YARN-10348
> URL: https://issues.apache.org/jira/browse/YARN-10348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.10.0, 3.1.3
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-10348.001.patch, YARN-10348.002.patch
>
>
> (Note: this change was originally done on our internal branch by [~daryn]).
> The RM currently has an option for a client to specify disabling token 
> cancellation when a job completes. This feature was an initial attempt to 
> address the use case of a job launching sub-jobs (ie. oozie launcher) and the 
> original job finishing prior to the sub-job(s) completion - ex. original job 
> completion triggered premature cancellation of tokens needed by the sub-jobs.
> Many years ago, [~daryn] added a more robust implementation to ref count 
> tokens ([YARN-3055]). This prevented premature cancellation of the token 
> until all apps using the token complete, and invalidated the need for a 
> client to specify cancel=false. Unfortunately the config option was not 
> removed.
> We have seen cases where oozie "java actions" and some users were explicitly 
> disabling token cancellation. This can lead to a buildup of defunct tokens 
> that may overwhelm the ZK buffer used by the KDC's backing store. At which 
> point the KMS fails to connect to ZK and is unable to issue/validate new 
> tokens - rendering the KDC only able to authenticate pre-existing tokens. 
> Production incidents have occurred due to the buffer size issue.
> To avoid these issues, the RM should have the option to ignore/override the 
> client's request to not cancel tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10229) [Federation] Client should be able to submit application to RM directly using normal client conf

2020-07-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154620#comment-17154620
 ] 

Bilwa S T commented on YARN-10229:
--

[~elgoiri] Please check this when you have free time. Thanks

> [Federation] Client should be able to submit application to RM directly using 
> normal client conf
> 
>
> Key: YARN-10229
> URL: https://issues.apache.org/jira/browse/YARN-10229
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy, federation
>Affects Versions: 3.1.1
>Reporter: JohnsonGuo
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-10229.001.patch, YARN-10229.002.patch, 
> YARN-10229.003.patch, YARN-10229.004.patch, YARN-10229.005.patch, 
> YARN-10229.006.patch, YARN-10229.007.patch, YARN-10229.008.patch
>
>
> Scenario: When enable the yarn federation feature with multi yarn clusters, 
> one can submit their job to yarn-router by *modified* their client 
> configuration with yarn router address.
> But if one still wants to submit their jobs via the original client (before 
> enable federation) to RM directly, it will encounter the AMRMToken exception. 
>  That means once enable federation ,if some one want to submit job, they have 
> to  modify the client conf.
>  
> one possible solution for this Scenario is:
> In NodeManger, when the client ApplicationMaster request comes:
>  * get the client job.xml  from HDFS "".
>  * parse the "yarn.resourcemanager.scheduler.address" parameter in job.xml
>  * if the value of the parameter is "localhost:8049"(AMRM address),then do 
> the AMRMToken valid process
>  * if the value of the parameter is "rm:port"(rm address),then skip the 
> AMRMToken valid process
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10106) Yarn logs CLI filtering by application attempt

2020-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154528#comment-17154528
 ] 

Hadoop QA commented on YARN-10106:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
53s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 18s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client: The patch generated 18 new 
+ 126 unchanged - 0 fixed = 144 total (was 126) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 26m 
39s{color} | {color:green} hadoop-yarn-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 88m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26263/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10106 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007367/YARN-10106.006.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 03a43611bd7d 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 5dd270e2085 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/26263/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 |
|  Test Results 

[jira] [Created] (YARN-10349) YARN_AM_RM_TOKEN, Localizer token does not have a service name

2020-07-09 Thread Prabhu Joseph (Jira)
Prabhu Joseph created YARN-10349:


 Summary: YARN_AM_RM_TOKEN, Localizer token does not have a service 
name
 Key: YARN-10349
 URL: https://issues.apache.org/jira/browse/YARN-10349
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


UGI Credentials#addToken silently overrides the token with same service name 
(HADOOP-17121). This causes tokens like YARN_AM_RM_TOKEN, Localizer with empty 
service name to get overridden. It is safer to have a service name for these 
tokens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10342) [UI1] Provide a way to hide Tools section in Web UIv1

2020-07-09 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154502#comment-17154502
 ] 

Andras Gyori edited comment on YARN-10342 at 7/9/20, 12:36 PM:
---

A patch has been uploaded, which puts a constraint on the tools section in 
applicationhistory, nodemanager, resourcemanager and mapreduce v2 historyserver 
webapp. It is vital to emphasize, however, that the links are still accessible 
in case the user knows the address (like /conf or /logs). The name of the 
configuration value is:
{noformat}
yarn.webapp.ui1.tools.enable{noformat}


was (Author: gandras):
A patch has been uploaded, which puts a constraint on the tools section in 
applicationhistory, nodemanager, resourcemanager and mapreduce v2 historyserver 
webapp. It is vital to emphasize, however, that the links are still accessible 
in case the user knows the address (like /conf or /logs).

> [UI1] Provide a way to hide Tools section in Web UIv1
> -
>
> Key: YARN-10342
> URL: https://issues.apache.org/jira/browse/YARN-10342
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10342.001.patch
>
>
> The Tools section in web UI1 might contain sensitive information, which 
> should ideally be hidden from end users. We should provide a configurable 
> value to hide it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10342) [UI1] Provide a way to hide Tools section in Web UIv1

2020-07-09 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154502#comment-17154502
 ] 

Andras Gyori commented on YARN-10342:
-

A patch has been uploaded, which puts a constraint on the tools section in 
applicationhistory, nodemanager, resourcemanager and mapreduce v2 historyserver 
webapp. It is vital to emphasize, however, that the links are still accessible 
in case the user knows the address (like /conf or /logs).

> [UI1] Provide a way to hide Tools section in Web UIv1
> -
>
> Key: YARN-10342
> URL: https://issues.apache.org/jira/browse/YARN-10342
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10342.001.patch
>
>
> The Tools section in web UI1 might contain sensitive information, which 
> should ideally be hidden from end users. We should provide a configurable 
> value to hide it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10342) [UI1] Provide a way to hide Tools section in Web UIv1

2020-07-09 Thread Andras Gyori (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Gyori updated YARN-10342:

Attachment: YARN-10342.001.patch

> [UI1] Provide a way to hide Tools section in Web UIv1
> -
>
> Key: YARN-10342
> URL: https://issues.apache.org/jira/browse/YARN-10342
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Andras Gyori
>Assignee: Andras Gyori
>Priority: Minor
> Attachments: YARN-10342.001.patch
>
>
> The Tools section in web UI1 might contain sensitive information, which 
> should ideally be hidden from end users. We should provide a configurable 
> value to hide it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10106) Yarn logs CLI filtering by application attempt

2020-07-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/YARN-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154450#comment-17154450
 ] 

Hudáky Márton Gyula commented on YARN-10106:


Thank you for the review [~bteke]!
I fixed the mentioned items and uploaded the new patch.

> Yarn logs CLI filtering by application attempt
> --
>
> Key: YARN-10106
> URL: https://issues.apache.org/jira/browse/YARN-10106
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Hudáky Márton Gyula
>Priority: Trivial
> Attachments: YARN-10106.001.patch, YARN-10106.002.patch, 
> YARN-10106.003.patch, YARN-10106.004.patch, YARN-10106.005.patch, 
> YARN-10106.006.patch
>
>
> {{ContainerLogsRequest}} got a new parameter in YARN-10101, which is the 
> {{applicationAttempt}} - we can use this new parameter in Yarn logs CLI as 
> well to filter by application attempt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10106) Yarn logs CLI filtering by application attempt

2020-07-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hudáky Márton Gyula updated YARN-10106:
---
Attachment: YARN-10106.006.patch

> Yarn logs CLI filtering by application attempt
> --
>
> Key: YARN-10106
> URL: https://issues.apache.org/jira/browse/YARN-10106
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Hudáky Márton Gyula
>Priority: Trivial
> Attachments: YARN-10106.001.patch, YARN-10106.002.patch, 
> YARN-10106.003.patch, YARN-10106.004.patch, YARN-10106.005.patch, 
> YARN-10106.006.patch
>
>
> {{ContainerLogsRequest}} got a new parameter in YARN-10101, which is the 
> {{applicationAttempt}} - we can use this new parameter in Yarn logs CLI as 
> well to filter by application attempt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10347) Fix double locking in CapacityScheduler#reinitialize in branch-3.1

2020-07-09 Thread Masatake Iwasaki (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154321#comment-17154321
 ] 

Masatake Iwasaki commented on YARN-10347:
-

cherry-picked to branch-3.2.

> Fix double locking in CapacityScheduler#reinitialize in branch-3.1
> --
>
> Key: YARN-10347
> URL: https://issues.apache.org/jira/browse/YARN-10347
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.1.4
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Critical
> Attachments: YARN-10347-branch-3.1.001.patch
>
>
> Double locking blocks another threads in ResourceManager waiting for the lock.
> I found the issue on testing hadoop-3.1.4-RC2 with RM-HA enabled deployment. 
> ResourceManager blocks on {{submitApplication}} waiting for the lock when I 
> run example MR applications.
> {noformat}
> "IPC Server handler 45 on default port 8032" #211 daemon prio=5 os_prio=0 
> tid=0x7f0e45a40200 nid=0x418 waiting on condition [0x7f0e14abe000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x85d56510> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.checkAndGetApplicationPriority(CapacityScheduler.java:2521)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:417)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.submitApplication(RMAppManager.java:342)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitApplication(ClientRMService.java:678)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitApplication(ApplicationClientProtocolPBServiceImpl.java:277)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:563)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1015)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2943)
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10333) YarnClient obtain Delegation Token for Log Aggregation Path

2020-07-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154290#comment-17154290
 ] 

Hudson commented on YARN-10333:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18422 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18422/])
YARN-10333. YarnClient obtain Delegation Token for Log Aggregation Path. 
(sunilg: rev 5dd270e2085c8e8c3428287ed6f0c541a5548a31)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClientImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
* (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/pom.xml


> YarnClient obtain Delegation Token for Log Aggregation Path
> ---
>
> Key: YARN-10333
> URL: https://issues.apache.org/jira/browse/YARN-10333
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: YARN-10333-001.patch, YARN-10333-002.patch, 
> YARN-10333-003.patch
>
>
> There are use cases where Yarn Log Aggregation Path is configured to a 
> FileSystem like S3 or ABFS different from what is configured in fs.defaultFS 
> (HDFS). Log Aggregation fails as the client has token only for fs.defaultFS 
> and not for log aggregation path.
> This Jira is to improve YarnClient by obtaining delegation token for log 
> aggregation path and add it to the Credential of Container Launch Context 
> similar to how it does for Timeline Delegation Token.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154282#comment-17154282
 ] 

Bilwa S T commented on YARN-10341:
--

Thanks [~brahmareddy]

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-09 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154281#comment-17154281
 ] 

Hudson commented on YARN-10341:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18421 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18421/])
YARN-10341. Yarn Service Container Completed event doesn't get (brahma: rev 
dfe60392c91be21f574c1659af22f5c381b2675a)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/test/java/org/apache/hadoop/yarn/service/TestServiceAM.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core/src/main/java/org/apache/hadoop/yarn/service/ServiceScheduler.java


> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Fix For: 3.4.0, 3.3.1
>
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-09 Thread Brahma Reddy Battula (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154250#comment-17154250
 ] 

Brahma Reddy Battula commented on YARN-10341:
-

[~BilwaST] thanks bilwa for addressing the checkstyle, going to commit shortly.

 

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154242#comment-17154242
 ] 

Bilwa S T commented on YARN-10341:
--

Fixed Checkstyle issues

> Yarn Service Container Completed event doesn't get processed 
> -
>
> Key: YARN-10341
> URL: https://issues.apache.org/jira/browse/YARN-10341
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Critical
> Attachments: YARN-10341.001.patch, YARN-10341.002.patch, 
> YARN-10341.003.patch, YARN-10341.004.patch
>
>
> If there 10 workers running and if containers get killed , after a while we 
> see that there are just 9 workers runnning. This is due to CONTAINER 
> COMPLETED Event is not processed on AM side. 
> Issue is in below code:
> {code:java}
> public void onContainersCompleted(List statuses) {
>   for (ContainerStatus status : statuses) {
> ContainerId containerId = status.getContainerId();
> ComponentInstance instance = 
> liveInstances.get(status.getContainerId());
> if (instance == null) {
>   LOG.warn(
>   "Container {} Completed. No component instance exists. 
> exitStatus={}. diagnostics={} ",
>   containerId, status.getExitStatus(), status.getDiagnostics());
>   return;
> }
> ComponentEvent event =
> new ComponentEvent(instance.getCompName(), CONTAINER_COMPLETED)
> .setStatus(status).setInstance(instance)
> .setContainerId(containerId);
> dispatcher.getEventHandler().handle(event);
>   }
> {code}
> If component instance doesnt exist for a container, it doesnt iterate over 
> other containers as its returning from method



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10341) Yarn Service Container Completed event doesn't get processed

2020-07-09 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154235#comment-17154235
 ] 

Hadoop QA commented on YARN-10341:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
 3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 23s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  0m 
55s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
27s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-YARN-Build/26262/artifact/out/Dockerfile
 |
| JIRA Issue | YARN-10341 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13007346/YARN-10341.004.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 760c568266eb 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 
10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 10d218934c9 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/26262/testReport/ |
| Max. process+thread count | 777 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-core
 U: 

[jira] [Commented] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-09 Thread Yao Guangdong (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17154217#comment-17154217
 ] 

Yao Guangdong commented on YARN-10324:
--

 Fixed cached memory calculate incorrect bug. Add new patch YARN-10324.002.patch

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10324) Fetch data from NodeManager may case read timeout when disk is busy

2020-07-09 Thread Yao Guangdong (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Guangdong updated YARN-10324:
-
Attachment: YARN-10324.002.patch

> Fetch data from NodeManager may case read timeout when disk is busy
> ---
>
> Key: YARN-10324
> URL: https://issues.apache.org/jira/browse/YARN-10324
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: auxservices
>Affects Versions: 2.7.0, 3.2.1
>Reporter: Yao Guangdong
>Priority: Minor
>  Labels: patch
> Attachments: YARN-10324.001.patch, YARN-10324.002.patch
>
>
>  With the cluster size become more and more big.The cost  time on Reduce 
> fetch Map's result from NodeManager become more and more long.We often see 
> the WARN logs in the reduce's logs as follow.
> {quote}2020-06-19 15:43:15,522 WARN [fetcher#8] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
> TX-196-168-211.com:13562 with 5 map outputs
> java.net.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
> at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
> at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:434)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:400)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:271)
> at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:330)
> at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:198)
> {quote}
>  We check the NodeManager server find that the disk IO util and connections 
> became very high when the read timeout happened.We analyze that if we have 
> 20,000 maps and 1,000 reduces which will make NodeManager generate 20 million 
> times IO stream operate in the shuffle phase.If the reduce fetch data size is 
> very small from map output files.Which make the disk IO util become very high 
> in big cluster.Then read timeout happened frequently.The application finished 
> time become longer.
> We find ShuffleHandler have IndexCache for cache file.out.index file.Then we 
> want to change the small IO to big IO which can reduce the small disk IO 
> times. So we try to cache all the small file data(file.out) in memory when 
> the first fetch request come.Then the others fetch request only need read 
> data from memory avoid disk IO operation.After we cache data to memory we 
> find the read timeout disappeared.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org