[jira] [Commented] (YARN-9640) Slow event processing could cause too many attempt unregister events

2019-06-27 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874723#comment-16874723
 ] 

Zhankun Tang commented on YARN-9640:


[~bibinchundatt], yeah. agree.

> Slow event processing could cause too many attempt unregister events
> 
>
> Key: YARN-9640
> URL: https://issues.apache.org/jira/browse/YARN-9640
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
>  Labels: scalability
> Attachments: YARN-9640.001.patch, YARN-9640.002.patch, 
> YARN-9640.003.patch
>
>
> We found in one of our test cluster verification that the number attempt 
> unregister events is about 300k+.
>  # AM all containers completed.
>  # AMRMClientImpl send finishApplcationMaster
>  # AMRMClient check event 100ms the finish Status using 
> finishApplicationMaster request.
>  # AMRMClientImpl#unregisterApplicationMaster
> {code:java}
>   while (true) {
> FinishApplicationMasterResponse response =
> rmClient.finishApplicationMaster(request);
> if (response.getIsUnregistered()) {
>   break;
> }
> LOG.info("Waiting for application to be successfully unregistered.");
> Thread.sleep(100);
>   }
> {code}
>  # ApplicationMasterService finishApplicationMaster interface sends 
> unregister events on every status update.
> We should send unregister event only once and cache event send , ignore and 
> send not unregistered response back to AM not overloading the event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-06-27 Thread liyakun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874719#comment-16874719
 ] 

liyakun commented on YARN-9480:
---

[~tangzhankun] please help to make [~Yunyao Zhang] as a contributor, and he 
will contribute to this issue.

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: liyakun
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9480) createAppDir() in LogAggregationService shouldn't block dispatcher thread of ContainerManagerImpl

2019-06-27 Thread Yunyao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874717#comment-16874717
 ] 

Yunyao Zhang commented on YARN-9480:


please assign to me.

> createAppDir() in LogAggregationService shouldn't block dispatcher thread of 
> ContainerManagerImpl
> -
>
> Key: YARN-9480
> URL: https://issues.apache.org/jira/browse/YARN-9480
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: liyakun
>Assignee: liyakun
>Priority: Major
>
> At present, when startContainers(), if NM does not contain the application, 
> it will enter the step of INIT_APPLICATION. In the application init step, 
> createAppDir() will be executed, and it is a blocking operation.
> createAppDir() is an operation that needs to interact with an external file 
> system. This operation is affected by the SLA of the external file system. 
> Once the external file system has a high latency, the NM dispatcher thread of 
> ContainerManagerImpl will be stuck. (In fact, I have seen a scene that NM 
> stuck here for more than an hour.)
> I think it would be more reasonable to move createAppDir() to the actual time 
> of uploading log (in other threads). And according to the logRetentionPolicy, 
> many of the containers may not get to this step, which will save a lot of 
> interactions with external file system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874687#comment-16874687
 ] 

Eric Yang commented on YARN-9560:
-

[~ebadger], formatOciEnvKey function is helpful in this case.  Thank you.

Another question for OCIContainerRuntime imports static method 
DockerLinuxContainerRuntime.isDockerContainerRequested; This looks a bit odd 
that base class imports static method of its extended class.  It seems like 
both classes will reply the same answer.  How is JVM identify if it should use 
RuncRuntime vs DockerLinuxContainerRuntime?  Does OCIContainerRuntime needs to 
import another static class from RuncRuntime to combine that logic in 
isOCICompliantContainerRequested?  Can we push isDockerContainerReuqested logic 
to OCIContainerRuntime to avoid the base class imports method from extended 
class?

 

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874670#comment-16874670
 ] 

Hadoop QA commented on YARN-9560:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
32s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 25s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 2 new + 22 unchanged - 2 fixed = 24 total (was 24) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 14s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
49s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9560 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973132/YARN-9560.012.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bf8150d01bf7 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4a21224 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/24332/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24332/testReport/ |
| Max. process+thread count | 307 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U:

[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874634#comment-16874634
 ] 

Hadoop QA commented on YARN-9655:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
53s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
18s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
53s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 76m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1023/2/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/1023 |
| JIRA Issue | YARN-9655 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux adc7662a227d 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 4a21224 |
| Default Java | 1.8.0_212 |
|  Test Results | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1023/2/testReport/ |
| Max. process+thread count | 446 (vs. ulimit of 5500) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn

[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-27 Thread Eric Badger (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874630#comment-16874630
 ] 

Eric Badger commented on YARN-9560:
---

Patch 012 keeps the environment variables as static in OCIContainerRuntime and 
moves the formatting to the subclasses where the runtime type is known and 
static. The formatting of the strings is a static function in 
OCIContainerRuntime. That method is called from DockerLinuxContainerRuntime to 
get the actual keys of the environment variables. These results are passed back 
to OCIContainerRuntime via abstract methods to get the environment variables.

I don't know if this is what was envisioned or not, but it works and leaves the 
variables as static. I'm not too particular about exactly how this piece of 
code is done. I'd just like to get something we can all agree on so that we can 
move to the next phase of patches with the actual runtime.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-27 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9560:
--
Attachment: YARN-9560.012.patch

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch, 
> YARN-9560.012.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-27 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874595#comment-16874595
 ] 

hunshenshi edited comment on YARN-9655 at 6/28/19 1:54 AM:
---

OK,Thanks [~cheersyang], I will fix it


was (Author: hunhun):
OK,I will fix it

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-27 Thread hunshenshi (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874595#comment-16874595
 ] 

hunshenshi commented on YARN-9655:
--

OK,I will fix it

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2019-06-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874571#comment-16874571
 ] 

Weiwei Yang commented on YARN-6629:
---

Hi [~aihuaxu], that's correct, this will be included in 2.10. But if you need 
it in the next 2.9.x release, then we need to backport to branch-2.9.

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional comma

[jira] [Updated] (YARN-9564) Create docker-to-squash tool for image conversion

2019-06-27 Thread Eric Badger (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-9564:
--
Description: The new runc runtime uses docker images that are converted 
into multiple squashfs images. Each layer of the docker image will get its own 
squashfs image. We need a tool to help automate the creation of these squashfs 
images when all we have is a docker image  (was: The new fsimage runtime uses 
docker images that are converted into multiple squashfs images. Each layer of 
the docker image will get its own squashfs image. We need a tool to help 
automate the creation of these squashfs images when all we have is a docker 
image)

> Create docker-to-squash tool for image conversion
> -
>
> Key: YARN-9564
> URL: https://issues.apache.org/jira/browse/YARN-9564
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>
> The new runc runtime uses docker images that are converted into multiple 
> squashfs images. Each layer of the docker image will get its own squashfs 
> image. We need a tool to help automate the creation of these squashfs images 
> when all we have is a docker image



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6046) Documentation correction in YarnApplicationSecurity

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874391#comment-16874391
 ] 

Hadoop QA commented on YARN-6046:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
35m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 22s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
30s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 27s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-6046 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973107/yarn-6046.002.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 13915e9921df 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 4a21224 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 306 (vs. ulimit of 5500) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24331/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Documentation correction in YarnApplicationSecurity
> ---
>
> Key: YARN-6046
> URL: https://issues.apache.org/jira/browse/YARN-6046
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Yousef Abu-Salah
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6046.001.patch, yarn-6046.002.patch
>
>
> Few documentation correction required in 
> hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html
> {code}
> 1. Suring AM startup, log in to Kerberos.
> {code}
> {code}
> Don’t. Rely on the lifespan of the 
> {code}
> {code}
> renewed automatically; the AM pushes out 
> {code}
> {code}
> In an insecure cluster, the application will run as the identity of the 
> account of the node manager, typically something such as yarn or mapred. By 
> default, the application will access HDFS as that user, with a different home 
> directory, and with a different user identified in audit logs and on file 
> system owner attributes.
> {code}
> Need to reframe sentence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9648) Improve serialization of Server side errors for YARN ApiServiceClient

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874369#comment-16874369
 ] 

Hadoop QA commented on YARN-9648:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
53s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m  1s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-9648 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973105/YARN-9648.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 15535ce79440 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be80334 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_212 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/24330/testReport/ |
| Max. process+thread count | 561 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24330/conso

[jira] [Updated] (YARN-6046) Documentation correction in YarnApplicationSecurity

2019-06-27 Thread Yousef Abu-Salah (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousef Abu-Salah updated YARN-6046:
---
Attachment: yarn-6046.002.patch

> Documentation correction in YarnApplicationSecurity
> ---
>
> Key: YARN-6046
> URL: https://issues.apache.org/jira/browse/YARN-6046
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Yousef Abu-Salah
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6046.001.patch, yarn-6046.002.patch
>
>
> Few documentation correction required in 
> hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html
> {code}
> 1. Suring AM startup, log in to Kerberos.
> {code}
> {code}
> Don’t. Rely on the lifespan of the 
> {code}
> {code}
> renewed automatically; the AM pushes out 
> {code}
> {code}
> In an insecure cluster, the application will run as the identity of the 
> account of the node manager, typically something such as yarn or mapred. By 
> default, the application will access HDFS as that user, with a different home 
> directory, and with a different user identified in audit logs and on file 
> system owner attributes.
> {code}
> Need to reframe sentence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2019-06-27 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874316#comment-16874316
 ] 

Aihua Xu commented on YARN-6629:


[~cheersyang] for this particular issue, since it's already in 2.10, I think we 
don't need additional backport since 2.10 will be the next release on branch-2, 
is that correct?

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.

[jira] [Updated] (YARN-9648) Improve serialization of Server side errors for YARN ApiServiceClient

2019-06-27 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-9648:

Attachment: YARN-9648.002.patch

> Improve serialization of Server side errors for YARN ApiServiceClient
> -
>
> Key: YARN-9648
> URL: https://issues.apache.org/jira/browse/YARN-9648
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.2.0, 3.1.1, 3.1.2
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN-9648.001.patch, YARN-9648.002.patch
>
>
> When server side throws exception, the output may have the following format:
> # A ServiceStatus object in JSON form
> # A generic exception class serialized in JSON form
> # A plain text/html output
> The current client will attempt to serialize all three forms of responses, 
> but response.getEntity does not always work when reading the response more 
> than once.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6046) Documentation correction in YarnApplicationSecurity

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874297#comment-16874297
 ] 

Hadoop QA commented on YARN-6046:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
28m  7s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:bdbca0e |
| JIRA Issue | YARN-6046 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12859811/YARN-6046.001.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 4cbec3156228 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / be80334 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 413 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/24329/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Documentation correction in YarnApplicationSecurity
> ---
>
> Key: YARN-6046
> URL: https://issues.apache.org/jira/browse/YARN-6046
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Yousef Abu-Salah
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6046.001.patch
>
>
> Few documentation correction required in 
> hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html
> {code}
> 1. Suring AM startup, log in to Kerberos.
> {code}
> {code}
> Don’t. Rely on the lifespan of the 
> {code}
> {code}
> renewed automatically; the AM pushes out 
> {code}
> {code}
> In an insecure cluster, the application will run as the identity of the 
> account of the node manager, typically something such as yarn or mapred. By 
> default, the application will access HDFS as that user, with a different home 
> directory, and with a different user identified in audit logs and on file 
> system owner attributes.
> {code}
> Need to reframe sentence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-27 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874288#comment-16874288
 ] 

Eric Yang commented on YARN-9560:
-

[~Jim_Brennan] From OCIRuntime and DockerLinuxContainerRuntime perspective, 
they are static string. The config key strings do not change when used in those 
classes. They will only change when a third implementation say RuncRuntime is 
implemented and allowing DockerLinuxContainerRuntime configuration keys to work 
in RuncRuntime. Instance variables only make sense to handle the config key 
overload in RuncRuntime because config keys becomes variables base on 
additional cluster config, and not static strings. This makes it more clear to 
the developers that they need to handle config key shading in RuncRuntime with 
care, while keeping DockerLinuxContainerRuntime logic as closely to how it was 
written as possible. This helps the community to maintaining the existing code 
base with least amount of risk to configuration fragmentation into multiple 
runtimes. Disruptive retrofitting to Docker runtime requires human parser to 
figure out if code sharing is put in the right place. Without ability to 
statically define Docker configuration key in DockerLinuxContainerRuntime, 
there is a risk to have Docker config key pop up in RuncRuntime or Runc config 
pop up in Docker. This goes back to my original concern that config key 
fragmentation issue hasn't been addressed correctly in patch 11.

> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874265#comment-16874265
 ] 

Hadoop QA commented on YARN-9629:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
48s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
22s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 382 unchanged - 1 fixed = 382 total (was 383) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 36s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
57s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
55s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 21m 
17s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
40s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 10s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=18.09.5 Server=18.09.5 Image:yetus/hadoop:bdbca0e53b4 |
| JIRA Issue | YARN-9629 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12973085/YARN-9629.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |
| uname | Linux a1452de6d9ee 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 
08:28:49 UTC 2019 x86_64 

[jira] [Assigned] (YARN-6046) Documentation correction in YarnApplicationSecurity

2019-06-27 Thread Daniel Templeton (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Templeton reassigned YARN-6046:
--

Assignee: Yousef Abu-Salah  (was: Prashant Jha)

> Documentation correction in YarnApplicationSecurity
> ---
>
> Key: YARN-6046
> URL: https://issues.apache.org/jira/browse/YARN-6046
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Yousef Abu-Salah
>Priority: Trivial
>  Labels: newbie
> Attachments: YARN-6046.001.patch
>
>
> Few documentation correction required in 
> hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html
> {code}
> 1. Suring AM startup, log in to Kerberos.
> {code}
> {code}
> Don’t. Rely on the lifespan of the 
> {code}
> {code}
> renewed automatically; the AM pushes out 
> {code}
> {code}
> In an insecure cluster, the application will run as the identity of the 
> account of the node manager, typically something such as yarn or mapred. By 
> default, the application will access HDFS as that user, with a different home 
> directory, and with a different user identified in audit logs and on file 
> system owner attributes.
> {code}
> Need to reframe sentence.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9560) Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime

2019-06-27 Thread Jim Brennan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874203#comment-16874203
 ] 

Jim Brennan commented on YARN-9560:
---

I think since these are not static strings - they are determined at runtime by 
calling a method getRuntimeType() - that the current camelCase naming is 
appropriate - it makes it clear to the developer that these strings are NOT 
static - they depend on the class of the runtime.  I think going to extra 
effort to make them static doesn't really make things clearer.


> Restructure DockerLinuxContainerRuntime to extend a new OCIContainerRuntime
> ---
>
> Key: YARN-9560
> URL: https://issues.apache.org/jira/browse/YARN-9560
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-9560.001.patch, YARN-9560.002.patch, 
> YARN-9560.003.patch, YARN-9560.004.patch, YARN-9560.005.patch, 
> YARN-9560.006.patch, YARN-9560.007.patch, YARN-9560.008.patch, 
> YARN-9560.009.patch, YARN-9560.010.patch, YARN-9560.011.patch
>
>
> Since the new RuncContainerRuntime will be using a lot of the same code as 
> DockerLinuxContainerRuntime, it would be good to move a bunch of the 
> DockerLinuxContainerRuntime code up a level to an abstract class that both of 
> the runtimes can extend. 
> The new structure will look like:
> {noformat}
> OCIContainerRuntime (abstract class)
>   - DockerLinuxContainerRuntime
>   - RuncContainerRuntime
> {noformat}
> This JIRA should only change the structure of the code, not the actual 
> semantics



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874185#comment-16874185
 ] 

Weiwei Yang commented on YARN-9655:
---

LGTM.

Not sure if this can get some folks familiar with this to review.

[~hunhun], can you fix the checkstyle issue? It's simple you just need to cut 
the line less than 80 chars.

Thanks

> AllocateResponse in FederationInterceptor lost  applicationPriority
> ---
>
> Key: YARN-9655
> URL: https://issues.apache.org/jira/browse/YARN-9655
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: federation
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
>
> In YARN Federation mode using FederationInterceptor, when submitting 
> application, am will report an error.
> {code:java}
> 2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
> java.lang.NullPointerException at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
>  at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874180#comment-16874180
 ] 

Hadoop QA commented on YARN-9655:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
27s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 21s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 22m 
45s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1023/1/artifact/out/Dockerfile
 |
| GITHUB PR | https://github.com/apache/hadoop/pull/1023 |
| JIRA Issue | YARN-9655 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 0bef0840670c 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / be80334 |
| Default Java | 1.8.0_212 |
| checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-

[jira] [Commented] (YARN-9642) AbstractYarnScheduler#clearPendingContainerCache could run even after transitiontostandby

2019-06-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874181#comment-16874181
 ] 

Weiwei Yang commented on YARN-9642:
---

Sorry for getting to this late. It's a good catch,  +1.

Thanks [~bibinchundatt], [~sunilg].

 

> AbstractYarnScheduler#clearPendingContainerCache could run even after 
> transitiontostandby
> -
>
> Key: YARN-9642
> URL: https://issues.apache.org/jira/browse/YARN-9642
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-9642.001.patch, image-2019-06-22-16-05-24-114.png
>
>
> The TimeTask could hold the reference of Scheduler in case of fast switch 
> over too.
>  AbstractYarnScheduler should make sure scheduled Timer cancelled on 
> serviceStop.
> Causes memory leak too
> !image-2019-06-22-16-05-24-114.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6629) NPE occurred when container allocation proposal is applied but its resource requests are removed before

2019-06-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874176#comment-16874176
 ] 

Weiwei Yang commented on YARN-6629:
---

hi [~aihuaxu], [~Tao Yang]

Feel free to create another Jira for the backport, loop me in and I'll help to 
review/commit.

Thanks.

> NPE occurred when container allocation proposal is applied but its resource 
> requests are removed before
> ---
>
> Key: YARN-6629
> URL: https://issues.apache.org/jira/browse/YARN-6629
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Fix For: 3.1.0, 2.10.0
>
> Attachments: YARN-6629.001.patch, YARN-6629.002.patch, 
> YARN-6629.003.patch, YARN-6629.004.patch, YARN-6629.005.patch, 
> YARN-6629.006.patch, YARN-6629.branch-2.001.patch
>
>
> I wrote a test case to reproduce another problem for branch-2 and found new 
> NPE error,  log: 
> {code}
> FATAL event.EventDispatcher (EventDispatcher.java:run(75)) - Error in 
> handling event type NODE_UPDATE to the Event Dispatcher
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AppSchedulingInfo.allocate(AppSchedulingInfo.java:446)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:516)
> at 
> org.apache.hadoop.yarn.client.TestNegativePendingResource$1.answer(TestNegativePendingResource.java:225)
> at 
> org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:31)
> at org.mockito.internal.MockHandler.handle(MockHandler.java:97)
> at 
> org.mockito.internal.creation.MethodInterceptorFilter.intercept(MethodInterceptorFilter.java:47)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp$$EnhancerByMockitoWithCGLIB$$29eb8afc.apply()
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2396)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.submitResourceCommitRequest(CapacityScheduler.java:2281)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1247)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1236)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1325)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1112)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:987)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1367)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:143)
> at 
> org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:66)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Reproduce this error in chronological order:
> 1. AM started and requested 1 container with schedulerRequestKey#1 : 
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests 
> Added schedulerRequestKey#1 into schedulerKeyToPlacementSets
> 2. Scheduler allocatd 1 container for this request and accepted the proposal
> 3. AM removed this request
> ApplicationMasterService#allocate -->  CapacityScheduler#allocate --> 
> SchedulerApplicationAttempt#updateResourceRequests --> 
> AppSchedulingInfo#updateResourceRequests --> 
> AppSchedulingInfo#addToPlacementSets --> 
> AppSchedulingInfo#updatePendingResources
> Removed schedulerRequestKey#1 from schedulerKeyToPlacementSets)
> 4. Scheduler applied this proposal
> CapacityScheduler#tryCommit --> FiCaSchedulerApp#apply --> 
> AppSchedulingInfo#allocate 
> Throw NPE when called 
> schedulerKeyToPlacementSets.get(schedulerRequestKey).allocate(schedulerKey, 
> type, node);



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn

[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874168#comment-16874168
 ] 

Weiwei Yang commented on YARN-9623:
---

Hi [~Tao Yang]

OK, I am fine with that. However, we still need to configuration 
\{{yarn.resourcemanager.activities-manager.app-activities.max-queue-length}} to 
be there. If this configuration is set, then the value should be enforced for 
the queue size and disable the auto-adjustment. Can you add that logic?

This is to ensure we have a workaround if the auto-calculation is suboptimal. 
Hope that makes sense.

Thanks

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-06-27 Thread Adam Antal (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874150#comment-16874150
 ] 

Adam Antal commented on YARN-9629:
--

Thanks for the suggestions [~snemeth]. Rephrased the texts in patch v4.

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9629) Support configurable MIN_LOG_ROLLING_INTERVAL

2019-06-27 Thread Adam Antal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Antal updated YARN-9629:
-
Attachment: YARN-9629.004.patch

> Support configurable MIN_LOG_ROLLING_INTERVAL
> -
>
> Key: YARN-9629
> URL: https://issues.apache.org/jira/browse/YARN-9629
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: log-aggregation, nodemanager, yarn
>Affects Versions: 3.2.0
>Reporter: Adam Antal
>Assignee: Adam Antal
>Priority: Minor
> Attachments: YARN-9629.001.patch, YARN-9629.002.patch, 
> YARN-9629.003.patch, YARN-9629.004.patch
>
>
> One of the log-aggregation parameter, the minimum valid value for 
> {{yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds}} is 
> MIN_LOG_ROLLING_INTERVAL - it has been hardcoded since its addition in 
> YARN-2583. 
> It has been empirically set as 1 hour, as lower values would too frequently 
> put the NodeManagers under pressure. For bigger clusters that is indeed a 
> valid limitation, but for smaller clusters it makes sense and a valid 
> customer usecase to use lower values, even like not so lower 30 mins. At this 
> point this can only be achieved by setting 
> {{yarn.nodemanager.log-aggregation.debug-enabled}}, which I believe should be 
> kept as debug purposes.
> I'm suggesting to make this min configurable, although a warning should be 
> logged in the NodeManager startup when this value is lower than 1 hour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9643) Federation: Add subClusterID in nodes page of Router web

2019-06-27 Thread hunshenshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hunshenshi updated YARN-9643:
-
Issue Type: Improvement  (was: Bug)

> Federation: Add subClusterID in nodes page of Router web
> 
>
> Key: YARN-9643
> URL: https://issues.apache.org/jira/browse/YARN-9643
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0
>Reporter: hunshenshi
>Priority: Major
> Attachments: nodes.png
>
>
> In nodes page of router web, there only are node info, No cluster id 
> corresponding to the node.
> [http://127.0.0.1:8089/cluster/nodes|http://192.168.169.72:8089/cluster/nodes]
> !nodes.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9655) AllocateResponse in FederationInterceptor lost applicationPriority

2019-06-27 Thread hunshenshi (JIRA)
hunshenshi created YARN-9655:


 Summary: AllocateResponse in FederationInterceptor lost  
applicationPriority
 Key: YARN-9655
 URL: https://issues.apache.org/jira/browse/YARN-9655
 Project: Hadoop YARN
  Issue Type: Bug
  Components: federation
Affects Versions: 3.2.0
Reporter: hunshenshi


In YARN Federation mode using FederationInterceptor, when submitting 
application, am will report an error.
{code:java}
2019-06-25 11:44:00,977 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: ERROR IN CONTACTING RM. 
java.lang.NullPointerException at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.handleJobPriorityChange(RMContainerAllocator.java:1025)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:880)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:286)
 at 
org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$AllocatorRunnable.run(RMCommunicator.java:280)
 at java.lang.Thread.run(Thread.java:748)
{code}
The reason is that applicationPriority is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9654) "Null Pointer Exception" when there is Disk Error Exception occurs during Localization

2019-06-27 Thread Akshay Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Agarwal reassigned YARN-9654:


Assignee: Akshay Agarwal  (was: Bilwa S T)

> "Null Pointer Exception" when there is Disk Error Exception occurs during 
> Localization
> --
>
> Key: YARN-9654
> URL: https://issues.apache.org/jira/browse/YARN-9654
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Akshay Agarwal
>Assignee: Akshay Agarwal
>Priority: Minor
>
> Currently Null Pointer Exception is thrown when there is Disk Error Exception 
> occurs during Localization
>  
> !http://dts.huawei.com/net/dts/fckeditor/download.ashx?Path=4Ycc0VSuBbmQDP4v02iGYJn%2bVPgAqwGG720AiKXBplWJ7wBLEazrAQ%3d%3d!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-9654) "Null Pointer Exception" when there is Disk Error Exception occurs during Localization

2019-06-27 Thread Akshay Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay Agarwal reassigned YARN-9654:


Assignee: Bilwa S T

> "Null Pointer Exception" when there is Disk Error Exception occurs during 
> Localization
> --
>
> Key: YARN-9654
> URL: https://issues.apache.org/jira/browse/YARN-9654
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Akshay Agarwal
>Assignee: Bilwa S T
>Priority: Minor
>
> Currently Null Pointer Exception is thrown when there is Disk Error Exception 
> occurs during Localization
>  
> !http://dts.huawei.com/net/dts/fckeditor/download.ashx?Path=4Ycc0VSuBbmQDP4v02iGYJn%2bVPgAqwGG720AiKXBplWJ7wBLEazrAQ%3d%3d!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9654) "Null Pointer Exception" when there is Disk Error Exception occurs during Localization

2019-06-27 Thread Akshay Agarwal (JIRA)
Akshay Agarwal created YARN-9654:


 Summary: "Null Pointer Exception" when there is Disk Error 
Exception occurs during Localization
 Key: YARN-9654
 URL: https://issues.apache.org/jira/browse/YARN-9654
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Akshay Agarwal


Currently Null Pointer Exception is thrown when there is Disk Error Exception 
occurs during Localization

 

!http://dts.huawei.com/net/dts/fckeditor/download.ashx?Path=4Ycc0VSuBbmQDP4v02iGYJn%2bVPgAqwGG720AiKXBplWJ7wBLEazrAQ%3d%3d!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-27 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874001#comment-16874001
 ] 

Tao Yang commented on YARN-9623:


Thanks [~cheersyang] for the feedback.
{quote}
However, the activity manager should be a general service, it should not be 
depending on CS's configuration.
{quote}
Yes, I had this concern before, but required number of app activities is indeed 
decided by a specific scheduler and even a specific scheduling policy inside 
the scheduler.  So the patch did the same as some general services like 
QueueACLsManager/SchedulerPlacementProcessor/... (using {{if scheduler 
instanceof CapacityScheduler}}). The specific scheduler can be ignored unless 
we just set maxQueueLength to max(configuredMaxQueueLength, 1.2 * numOfNodes), 
this may somehow waste a lot in a large cluster with multi-nodes placement 
enabled. Thoughts?

{quote}
Another thing is appActivitiesMaxQueueLength, do we need to make it atomic 
because it is being modified in another thread.
{quote}
It's no need to make it atomic since there's no requirements for sequence or 
consistency, but violate is necessary for this variable.

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9623) Auto adjust max queue length of app activities to make sure activities on all nodes can be covered

2019-06-27 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873966#comment-16873966
 ] 

Weiwei Yang commented on YARN-9623:
---

Hi [~Tao Yang]

Generally, I think this is a good approach, to have fewer configs.

However, the activity manager should be a general service, it should not be 
depending on CS's configuration, for example, the number of async threads. How 
about to let it just be {{1.2 * numOfNodes}} for both cases and see how this 
works? We can continue to tune this after we have more experience to use this 
in real clusters.

Another thing is {{appActivitiesMaxQueueLength}}, do we need to make it atomic 
because it is being modified in another thread.

Thanks

 

> Auto adjust max queue length of app activities to make sure activities on all 
> nodes can be covered
> --
>
> Key: YARN-9623
> URL: https://issues.apache.org/jira/browse/YARN-9623
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9623.001.patch
>
>
> Currently we can use configuration entry 
> "yarn.resourcemanager.activities-manager.app-activities.max-queue-length" to 
> control max queue length of app activities, but in some scenarios , this 
> configuration may need to be updated in a growing cluster. Moreover, it's 
> better for users to ignore that conf therefor it should be auto adjusted 
> internally.
>  There are some differences among different scheduling modes:
>  * multi-node placement disabled
>  ** Heartbeat driven scheduling: max queue length of app activities should 
> not less than the number of nodes, considering nodes can not be always in 
> order, we should make some room for misorder, for example, we can guarantee 
> that max queue length should not be less than 1.2 * numNodes
>  ** Async scheduling: every async scheduling thread goes through all nodes in 
> order, in this mode, we should guarantee that max queue length should be 
> numThreads * numNodes.
>  * multi-node placement enabled: activities on all nodes can be involved in a 
> single app allocation, therefor there's no need to adjust for this mode.
> To sum up, we can adjust the max queue length of app activities like this:
> {code}
> int configuredMaxQueueLength;
> int maxQueueLength;
> serviceInit(){
>   ...
>   configuredMaxQueueLength = ...; //read configured max queue length
>   maxQueueLength = configuredMaxQueueLength; //take configured value as 
> default
> }
> CleanupThread#run(){
>   ...
>   if (multiNodeDisabled) {
> if (asyncSchedulingEnabled) {
>maxQueueLength = max(configuredMaxQueueLength, numSchedulingThreads * 
> numNodes);
> } else {
>maxQueueLength = max(configuredMaxQueueLength, 1.2 * numNodes);
> }
>   } else if (maxQueueLength != configuredMaxQueueLength) {
> maxQueueLength = configuredMaxQueueLength;
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6740) Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2

2019-06-27 Thread wangxiangchun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16873919#comment-16873919
 ] 

wangxiangchun commented on YARN-6740:
-

sorry , could I ask how long it will take ?

> Federation Router (hiding multiple RMs for ApplicationClientProtocol) phase 2
> -
>
> Key: YARN-6740
> URL: https://issues.apache.org/jira/browse/YARN-6740
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Giovanni Matteo Fumarola
>Assignee: Abhishek Modi
>Priority: Major
>
> This JIRA tracks the implementation of the layer for routing 
> ApplicaitonClientProtocol requests to the appropriate RM(s) in a federated 
> YARN cluster.
> Under the YARN-3659 we only implemented getNewApplication, submitApplication, 
> forceKillApplication and getApplicationReport to execute applications E2E.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org