date:20180918

[jira] [Updated] (YARN-8795) QueuePlacementRule move to separate files

2018-09-18 Thread Shuai Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8795:
--
Attachment: YARN-8795.003.patch

> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch, YARN-8795.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620131#comment-16620131
 ] 

Hadoop QA commented on YARN-8795:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 21s{color} 
| {color:red} YARN-8795 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940350/YARN-8795.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21875/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8795) QueuePlacementRule move to separate files

2018-09-18 Thread Shuai Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8795:
--
Attachment: YARN-8795.002.patch

> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8795) QueuePlacementRule move to separate files

2018-09-18 Thread Shuai Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8795:
--
Attachment: (was: YARN-8795.001.patch)

> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8793) QueuePlacementPolicy bind more information to assgining result

2018-09-18 Thread Shuai Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8793:
--
Attachment: (was: YARN-8793.001.patch)

> QueuePlacementPolicy bind more information to assgining result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8793) QueuePlacementPolicy bind more information to assgining result

2018-09-18 Thread Shuai Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8793:
--
Attachment: YARN-8793.001.patch

> QueuePlacementPolicy bind more information to assgining result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8795) QueuePlacementRule move to separate files

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620102#comment-16620102
 ] 

Hadoop QA commented on YARN-8795:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} YARN-8795 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8795 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940347/YARN-8795.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21873/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> QueuePlacementRule move to separate files
> -
>
> Key: YARN-8795
> URL: https://issues.apache.org/jira/browse/YARN-8795
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8795.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command

2018-09-18 Thread Eric Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620077#comment-16620077
 ] 

Eric Yang commented on YARN-8777:
-

[~Zian Chen] {quote}
The method param list have out an outlen which didn't match the signature, and 
we miss description for param args, is this typo?
{quote}

Good catch, I will make correction.

{quote}we can probably give an enum to index several common used command 
options, and ask node manager only pass index which can be matched with one of 
these enum elements, in this way we can have some kind of flexibility without 
open up bigger attack interface. {quote}

The enum approach can be used for fixed number of parameters or a small set of 
parameters.  It is probably not an ideal interface to pass arbitrary commands 
to container-executor for docker exec.  One possible danger is sending hex code 
as argv to trigger buffer overflow in container-executor or docker, where there 
is no logic to validate the arbitrary command.

{quote}should we also take care of passing shell commands inside the container 
?{quote}

The entire pipeline looks like websocket > node manger > container-executor > 
docker -it exec bash.  Every keystroke is write from web socket to bash to 
interpret the incoming input stream via stdin.  All output are written out from 
bash to stdout back to web socket.  This simulates the terminal behavior.  
There is no need to do additional processing of shell commands with current 
arrangement.

> Container Executor C binary change to execute interactive docker command
> 
>
> Key: YARN-8777
> URL: https://issues.apache.org/jira/browse/YARN-8777
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8777.001.patch
>
>
> Since Container Executor provides Container execution using the native 
> container-executor binary, we also need to make changes to accept new 
> “dockerExec” method to invoke the corresponding native function to execute 
> docker exec command to the running container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8793) QueuePlacementPolicy bind more information to assgining result

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620071#comment-16620071
 ] 

Hadoop QA commented on YARN-8793:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-8793 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-8793 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940342/YARN-8793.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21872/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> QueuePlacementPolicy bind more information to assgining result
> --
>
> Key: YARN-8793
> URL: https://issues.apache.org/jira/browse/YARN-8793
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Affects Versions: 3.1.1
>Reporter: Shuai Zhang
>Priority: Major
> Attachments: YARN-8793.001.patch
>
>
> Fair scheduler's QueuePlacementPolicy should bind more information to 
> assigning result:
>  # Whether to terminate the chain of responsibility
>  # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8795) QueuePlacementRule move to separate files

2018-09-18 Thread Shuai Zhang (JIRA)

Shuai Zhang created YARN-8795:
-

 Summary: QueuePlacementRule move to separate files
 Key: YARN-8795
 URL: https://issues.apache.org/jira/browse/YARN-8795
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: 3.1.1
Reporter: Shuai Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620042#comment-16620042
 ] 

Tao Yang commented on YARN-8771:


Attached v3 patch to improve unit test without adding new 
"resource-types-1.xml" file.
Found the {{yarn.test.reset-resource-types}} configuration item from  
TestCapacitySchedulerWithMultiResourceTypes, it can avoid reloading resource 
types in MockRM.

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch, 
> YARN-8771.003.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Tao Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8771:
---
Attachment: YARN-8771.003.patch

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch, 
> YARN-8771.003.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620022#comment-16620022
 ] 

Hadoop QA commented on YARN-8752:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
39s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
36m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8752 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940336/YARN-8752-1.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 7c52c69daeb9 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 17f5651 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 332 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21870/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> yarn-registry.md has wrong word ong-lived,it should be long-lived
> -
>
> Key: YARN-8752
> URL: https://issues.apache.org/jira/browse/YARN-8752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.0
>Reporter: leiqiang
>Priority: Major
>  Labels: documentation
> Attachments: YARN-8752-1.patch
>
>
> In yarn-registry.md line 88,
> deploy {color:#FF}ong-lived{color} services instances， this word should 
> be {color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-09-18 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1662#comment-1662
 ] 

Weiwei Yang commented on YARN-8468:
---

Thanks [~bsteinbach] for the updates. A general comment to your patch is, 
please minimize the changes in the patch to keep it neat. Format changes, 
switch lines etc should be avoid. Otherwise it is not easy to review because we 
can't focus on the changes required for this feature. Detail review comments 
below:

1. DefaultAMSProcessor
All the changes in DefaultAMSProcessor seems unnecessary

2. FileScheduler.md
{noformat}
maximum resources a queue can allocate for a single container, expressed  in 
the form of
{noformat}
a extra space between "expressed" and "in"

3.  I think we should use consistent names, but I saw some places have "Max 
Container Allocation/maxContainerAllocation" and some other places have "Max 
Container Resources/maxContainerResources". A bit confusing.

4. The changes of those imports to {{FSLeafQueue}} seems not necessary, please 
do not include changes like to put a import to another line etc. Same comment 
applies to classes like {{FSParentQueue}}, {{PlacementConstraintProcessor}}, 
{{QueueProperties}}, {{TestAllocationFileLoaderService}}, {{TestAppManager}}, 
{{TestCapacityScheduler}}

5. RMServiceUtils.java, line 331 - 337, changes are not necessary 

6. SchedulerUtils.java, except the changes in {{normalizeAndvalidateRequest}} 
and {{validateResourceRequest}}, please remove rest of changes

7. Lots of unnecessary changes in {{TestRMServerUtils}}

8. Why it even changes {{TestSchedulerNegotiator}} and {{TestAMLaunchFailure}}, 
both of them are no longer used right?

Besides, there are still 18 checkstyle issues to fix.

Thanks

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620001#comment-16620001
 ] 

Tao Yang commented on YARN-8771:


Thanks [~leftnoteasy] for your review and suggestion. 
I will update the patch later to improve unit test.

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8794) QueuePlacementPolicy add more rules

2018-09-18 Thread Shuai Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuai Zhang updated YARN-8794:
--
Description: 
Still need more useful rules:
 # RejectNonLeafQueue
 # RejectDefaultQueue
 # RejectUsers
 # RejectQueues
 # DefaultByUser

> QueuePlacementPolicy add more rules
> ---
>
> Key: YARN-8794
> URL: https://issues.apache.org/jira/browse/YARN-8794
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Shuai Zhang
>Priority: Major
>
> Still need more useful rules:
>  # RejectNonLeafQueue
>  # RejectDefaultQueue
>  # RejectUsers
>  # RejectQueues
>  # DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8794) QueuePlacementPolicy add more rules

2018-09-18 Thread Shuai Zhang (JIRA)

Shuai Zhang created YARN-8794:
-

 Summary: QueuePlacementPolicy add more rules
 Key: YARN-8794
 URL: https://issues.apache.org/jira/browse/YARN-8794
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Shuai Zhang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8793) QueuePlacementPolicy bind more information to assgining result

2018-09-18 Thread Shuai Zhang (JIRA)

Shuai Zhang created YARN-8793:
-

 Summary: QueuePlacementPolicy bind more information to assgining 
result
 Key: YARN-8793
 URL: https://issues.apache.org/jira/browse/YARN-8793
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Affects Versions: 3.1.1
Reporter: Shuai Zhang


Fair scheduler's QueuePlacementPolicy should bind more information to assigning 
result:
 # Whether to terminate the chain of responsibility
 # The reason to reject a request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8792) Revisit FiarScheduler QueuePlacementPolicy

2018-09-18 Thread Shuai Zhang (JIRA)

Shuai Zhang created YARN-8792:
-

 Summary: Revisit FiarScheduler QueuePlacementPolicy 
 Key: YARN-8792
 URL: https://issues.apache.org/jira/browse/YARN-8792
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Affects Versions: 3.1.1
Reporter: Shuai Zhang


Fair scheduler use `QueuePlacementPolicy` to map a request to queue. There are 
several problems:
 # The termination of the responsibility chain should bind to the assigning 
result instead of the rule.
 # It should provide a reason when rejecting a request.
 # Still need more useful rules:
 ## RejectNonLeafQueue
 ## RejectDefaultQueue
 ## RejectUsers
 ## RejectQueues
 ## DefaultByUser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-18 Thread leiqiang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leiqiang reopened YARN-8752:


> yarn-registry.md has wrong word ong-lived,it should be long-lived
> -
>
> Key: YARN-8752
> URL: https://issues.apache.org/jira/browse/YARN-8752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.0
>Reporter: leiqiang
>Priority: Major
>  Labels: documentation
> Attachments: YARN-8752-1.patch
>
>
> In yarn-registry.md line 88,
> deploy {color:#FF}ong-lived{color} services instances， this word should 
> be {color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8752) yarn-registry.md has wrong word ong-lived,it should be long-lived

2018-09-18 Thread leiqiang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leiqiang updated YARN-8752:
---
Attachment: (was: YARN-8752-1.patch)

> yarn-registry.md has wrong word ong-lived,it should be long-lived
> -
>
> Key: YARN-8752
> URL: https://issues.apache.org/jira/browse/YARN-8752
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.0
>Reporter: leiqiang
>Priority: Major
>  Labels: documentation
> Attachments: YARN-8752-1.patch
>
>
> In yarn-registry.md line 88,
> deploy {color:#FF}ong-lived{color} services instances， this word should 
> be {color:#FF}long-lived{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619930#comment-16619930
 ] 

Hadoop QA commented on YARN-8789:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 8 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m  
5s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 56s{color} | {color:orange} root: The patch generated 8 new + 814 unchanged 
- 11 fixed = 822 total (was 825) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
10s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
32s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 72m 33s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  4m 
20s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 50s{color} 
| {color:red} hadoop-mapreduce-client-app in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
32s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 10s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8789 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940311/YARN-8789.2.patch |
| Optional Tests |  dupname  asflicense  compile  javac

[jira] [Updated] (YARN-8791) When STOPSIGNAL is not present then docker inspect returns an extra line feed

2018-09-18 Thread Chandni Singh (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8791:

Attachment: YARN-8791.001.patch

> When STOPSIGNAL is not present then docker inspect returns an extra line feed
> -
>
> Key: YARN-8791
> URL: https://issues.apache.org/jira/browse/YARN-8791
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8791.001.patch
>
>
> When the STOPSIGNAL is missing, then an extra line feed is appended to the 
> output. This messes with the signal sent to the docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8791) When STOPSIGNAL is not present then docker inspect returns an extra line feed

2018-09-18 Thread Chandni Singh (JIRA)

Chandni Singh created YARN-8791:
---

 Summary: When STOPSIGNAL is not present then docker inspect 
returns an extra line feed
 Key: YARN-8791
 URL: https://issues.apache.org/jira/browse/YARN-8791
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh
Assignee: Chandni Singh


When the STOPSIGNAL is missing, then an extra line feed is appended to the 
output. This messes with the signal sent to the docker container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619845#comment-16619845
 ] 

Hadoop QA commented on YARN-8696:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  8m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
14s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
53s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m  4s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}201m 24s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacityOverTimePolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8696 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940281/YARN-8696.v5.patch |
| Optional Tests |  dupname  asflicense

[jira] [Updated] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8771:
-
Target Version/s: 3.1.1, 3.2.0

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619835#comment-16619835
 ] 

Wangda Tan commented on YARN-8771:
--

Nice catch! Thanks [~Tao Yang]. 

Patch LGTM as well. For the test, you can check 
TestCapacitySchedulerWithMultiResourceTypes as examples about how to do unit 
tests for multiple resource types without adding resource-types.xml. 

And I think we should put this to branch-3.1 as well. 

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8784) DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619817#comment-16619817
 ] 

Hadoop QA commented on YARN-8784:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 23s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 3 new + 141 unchanged - 0 fixed = 144 total (was 141) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 
23s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 71m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8784 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940298/YARN-8784.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 451cfea3df3f 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e71f61e |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21868/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21868/testReport/ |
| Max. process+thread count | 332 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U:

[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2018-09-18 Thread Arun Suresh (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619809#comment-16619809
 ] 

Arun Suresh commented on YARN-1013:
---

Thanks for taking a quick look [~elgoiri]

So, the patch was more of a POC patch (I should have named it as such) I built 
on top of current branch-2 + some YARN-1011 patches I pulled from that branch - 
to vet the approach, but yes, I shall clean it up.. and put in a patch for 
trunk.


> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-1013-001.branch-2.patch
>
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619796#comment-16619796
 ] 

Hadoop QA commented on YARN-7599:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} YARN-7402 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
28s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
44s{color} | {color:green} YARN-7402 passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 20s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
35s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
36s{color} | {color:green} YARN-7402 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} YARN-7402 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 15s{color} | {color:orange} The patch fails to run checkstyle in hadoop-yarn 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  5m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
24s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
25s{color} | {color:green} hadoop-yarn-server-common in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-server-globalpolicygenerator in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}122m  9s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 |
| JIRA Issue | YARN-7599 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940285/YARN-7599-YARN-7402.v5.patch
 |
| Optional Tests |  dupname  asflicense  compile  javac

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.2.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: (was: YARN-8789.2.patch)

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.2.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: (was: YARN-8789.2.patch)

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.2.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2018-09-18 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619771#comment-16619771
 ] 

Íñigo Goiri commented on YARN-1013:
---

Thanks [~asuresh] for  [^YARN-1013-001.branch-2.patch].
A couple general questions:
* Can we get a patch for trunk for Yetus to be able to run (branch-2 has 
issues)?
* Can you give an overview comparing to the FS approach? I went through the 
patch and it is hard to compare as this uses the allocator.

Comments to the patch itself:
* Some of the debug messages seem for development. Should we keep all of them?
* Can you add more comments to {{testContainerOverAllocation()}}? For example, 
we setup one node without overallocation and one with it. Why those numbers and 
what is the goal?
* Can we add a couple lower level unit tests? Just testing the allocator or the 
scheduler?
* There are many space fixes, can we avoid most of them? Specially, pass the 
null by default as second parameter to registerNode for TestAMRestart and 
TestReservations.

> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-1013-001.branch-2.patch
>
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: (was: YARN-8789.2.patch)

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.2.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch, YARN-8789.2.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8784) DockerLinuxContainerRuntime prevents access to distributed cache entries on a full disk

2018-09-18 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8784:
--
Attachment: YARN-8784.001.patch

> DockerLinuxContainerRuntime prevents access to distributed cache entries on a 
> full disk
> ---
>
> Key: YARN-8784
> URL: https://issues.apache.org/jira/browse/YARN-8784
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8784.001.patch
>
>
> DockerLinuxContainerRuntime bind mounts the filecache and usercache 
> directories into the container to allow tasks to access entries in the 
> distributed cache.  However it only bind mounts  directories on disks that 
> are considered good, and disks that are full or bad are not in that list.  If 
> a container tries to run with a distributed cache entry that has been 
> previously localized to a disk that is now considered full/bad, the dist 
> cache directory will _not_ be bind-mounted into the container's filesystem 
> namespace.  At that point any symlinks in the container's current working 
> directory that point to those disks will reference invalid paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-18 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619730#comment-16619730
 ] 

Hudson commented on YARN-8648:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14997 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14997/])
YARN-8648. Container cgroups are leaked when using docker. Contributed (jlowe: 
rev 2df0a8dcb3dfde15d216481cc1296d97d2cb5d43)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerCommandExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerRmCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerRmCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8648.001.patch, YARN-8648.002.patch, 
> YARN-8648.003.patch, YARN-8648.004.patch, YARN-8648.005.patch, 
> YARN-8648.006.patch
>
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619716#comment-16619716
 ] 

Hadoop QA commented on YARN-8789:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
14s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 30s{color} | {color:orange} root: The patch generated 9 new + 788 unchanged 
- 10 fixed = 797 total (was 798) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
47s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 1 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 35s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
22s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 42s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
55s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
16s{color} | {color:green} hadoop-mapreduce-client-app in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}198m 12s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy
 |
|   | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8789 |
| JIRA Patch URL |

[jira] [Commented] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2018-09-18 Thread Arun Suresh (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619709#comment-16619709
 ] 

Arun Suresh commented on YARN-1013:
---

Attached an initial version of the patch for branch-2.
Kindly review..

> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-1013-001.branch-2.patch
>
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2018-09-18 Thread Arun Suresh (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-1013:
--
Attachment: YARN-1013-001.branch-2.patch

> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun Suresh
>Priority: Major
> Attachments: YARN-1013-001.branch-2.patch
>
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-18 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619670#comment-16619670
 ] 

Jason Lowe commented on YARN-8648:
--

Thanks, [~billie.rinaldi]!  Looks like this is good to go then.  Committing 
this.

> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8648.001.patch, YARN-8648.002.patch, 
> YARN-8648.003.patch, YARN-8648.004.patch, YARN-8648.005.patch, 
> YARN-8648.006.patch
>
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Created] (YARN-8790) Authentication Filter change to force security check

2018-09-18 Thread Zian Chen (JIRA)

Zian Chen created YARN-8790:
---

 Summary: Authentication Filter change to force security check 
 Key: YARN-8790
 URL: https://issues.apache.org/jira/browse/YARN-8790
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zian Chen


Hadoop node manager REST API is authenticated using AuthenticationFilter from 
Hadoop-auth project. AuthenticationFilter is added to the new WebSocket URL 
path spec. The requested remote user is verified to match the container owner 
to allow WebSocket connection to be established. WebSocket servlet code 
enforces the username match check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8648) Container cgroups are leaked when using docker

2018-09-18 Thread Billie Rinaldi (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619655#comment-16619655
 ] 

Billie Rinaldi commented on YARN-8648:
--

Thanks for checking in, [~Jim_Brennan] and [~jlowe]. I don't have any issues 
with the new behavior of the remove operation failing when cgroups fail to be 
cleaned up.

> Container cgroups are leaked when using docker
> --
>
> Key: YARN-8648
> URL: https://issues.apache.org/jira/browse/YARN-8648
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8648.001.patch, YARN-8648.002.patch, 
> YARN-8648.003.patch, YARN-8648.004.patch, YARN-8648.005.patch, 
> YARN-8648.006.patch
>
>
> When you run with docker and enable cgroups for cpu, docker creates cgroups 
> for all resources on the system, not just for cpu.  For instance, if the 
> {{yarn.nodemanager.linux-container-executor.cgroups.hierarchy=/hadoop-yarn}}, 
> the nodemanager will create a cgroup for each container under 
> {{/sys/fs/cgroup/cpu/hadoop-yarn}}.  In the docker case, we pass this path 
> via the {{--cgroup-parent}} command line argument.   Docker then creates a 
> cgroup for the docker container under that, for instance: 
> {{/sys/fs/cgroup/cpu/hadoop-yarn/container_id/docker_container_id}}.
> When the container exits, docker cleans up the {{docker_container_id}} 
> cgroup, and the nodemanager cleans up the {{container_id}} cgroup,   All is 
> good under {{/sys/fs/cgroup/hadoop-yarn}}.
> The problem is that docker also creates that same hierarchy under every 
> resource under {{/sys/fs/cgroup}}.  On the rhel7 system I am using, these 
> are: blkio, cpuset, devices, freezer, hugetlb, memory, net_cls, net_prio, 
> perf_event, and systemd.So for instance, docker creates 
> {{/sys/fs/cgroup/cpuset/hadoop-yarn/container_id/docker_container_id}}, but 
> it only cleans up the leaf cgroup {{docker_container_id}}.  Nobody cleans up 
> the {{container_id}} cgroups for these other resources.  On one of our busy 
> clusters, we found > 100,000 of these leaked cgroups.
> I found this in our 2.8-based version of hadoop, but I have been able to 
> repro with current hadoop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2018-09-18 Thread Botong Huang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619631#comment-16619631
 ] 

Botong Huang commented on YARN-7599:


Thanks [~bibinchundatt] for the review!  [^YARN-7599-YARN-7402.v5.patch] 
uploaded with fixes. 

bq. 4. Rename testcase name -> testBasicCase
This is already the test case name, typo? 

bq. 1. During long maintainance period of cluster, We might require option to 
disable cleaner at run time. rt ??
Application cleaner is disabled when 
YarnConfiguration.GPG_APPCLEANER_INTERVAL_MS is set to zero or negative value: 

{code:java}
120 long appCleanerIntervalMs =
121 
getConfig().getLong(YarnConfiguration.GPG_APPCLEANER_INTERVAL_MS,
122 YarnConfiguration.DEFAULT_GPG_APPCLEANER_INTERVAL_MS);
123 if (appCleanerIntervalMs > 0) {
124   
this.scheduledExecutorService.scheduleAtFixedRate(this.applicationCleaner,
125   0, appCleanerIntervalMs, TimeUnit.MILLISECONDS);
{code}


> [GPG] ApplicationCleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-7599-YARN-7402.v1.patch, 
> YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, 
> YARN-7599-YARN-7402.v4.patch, YARN-7599-YARN-7402.v5.patch
>
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records. For the 
> latter, failed and killed applications might leave records in the Yarn 
> Registry (see YARN-6128). We plan to do both cleanup work in 
> ApplicationCleaner in GPG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2018-09-18 Thread Botong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-7599:
---
Attachment: YARN-7599-YARN-7402.v5.patch

> [GPG] ApplicationCleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-7599-YARN-7402.v1.patch, 
> YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, 
> YARN-7599-YARN-7402.v4.patch, YARN-7599-YARN-7402.v5.patch
>
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records. For the 
> latter, failed and killed applications might leave records in the Yarn 
> Registry (see YARN-6128). We plan to do both cleanup work in 
> ApplicationCleaner in GPG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-18 Thread Botong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-8696:
---
Attachment: YARN-8696.v5.patch

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, 
> YARN-8696.v3.patch, YARN-8696.v4.patch, YARN-8696.v5.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2018-09-18 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619572#comment-16619572
 ] 

Bibin A Chundatt commented on YARN-7599:


Thank you [~botong] for clarification

Few  comments on patch

# Can you change to single configuration similar to 
{{dfs.http.client.retry.policy.spec}} {min,max,interval}. Advantage - reduces 
configs
{code}
3381  public static final String GPG_APPCLEANER_MIN_ROUTER_SUCCESS =
3382  FEDERATION_GPG_PREFIX + "application.cleaner.min-router-success";
3389  public static final String GPG_APPCLEANER_MAX_ROUTER_RETRY =
3390  FEDERATION_GPG_PREFIX + "application.cleaner.max-router-retry";
3391  public static final int DEFAULT_GPG_APPCLEANER_MAX_ROUTER_RETRY = 10;
3397  public static final String GPG_APPCLEANER_ROUTER_RETRY_INTEVAL_MS =
3398  FEDERATION_GPG_PREFIX + 
"application.cleaner.router-retry-interval-ms";
{code}
# Class should be DefaultApplicationCleaner
{code}
38  public class DefaultApplicationCleaner extends ApplicationCleaner {
39private static final Logger LOG =
40LoggerFactory.getLogger(ApplicationCleaner.class);
{code}
# Log flooding will happen in debug mode with current implementation. Comma 
separated list is better 
routerApps.stream().map(Object::toString).collect(Collectors.joining(","));
{code}
59for (ApplicationId appId : routerApps) {
60  LOG.debug("Running application {} in the cluster", 
appId.toString());
61}
{code}
# Rename testcase name -> {{testBasicCase}}

Clarfication

# During long maintainance period of cluster, We might require option to 
disable cleaner at run time. rt ??

> [GPG] ApplicationCleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-7599-YARN-7402.v1.patch, 
> YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, 
> YARN-7599-YARN-7402.v4.patch
>
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records. For the 
> latter, failed and killed applications might leave records in the Yarn 
> Registry (see YARN-6128). We plan to do both cleanup work in 
> ApplicationCleaner in GPG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8777) Container Executor C binary change to execute interactive docker command

2018-09-18 Thread Zian Chen (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619569#comment-16619569
 ] 

Zian Chen commented on YARN-8777:
-

Hi [~eyang], thanks for the patch, some quick suggestions and questions,

1. 
{code:java}
 /**
+ * Get the Docker exec command line string. The function will verify that the 
params file is meant for the exec command.
+ * @param command_file File containing the params for the Docker start command
+ * @param conf Configuration struct containing the container-executor.cfg 
details
+ * @param out Buffer to fill with the exec command
+ * @param outlen Size of the output buffer
+ * @return Return code with 0 indicating success and non-zero codes indicating 
error
+ */
+int get_docker_exec_command(const char* command_file, const struct 
configuration* conf, args *args);{code}
The method param list have out an outlen which didn't match the signature, and 
we miss description for param args, is this typo?

2. for the code reuse you discussed with [~ebadger], my quick thoughts is 
instead of passing parameters from node  manager, we can probably give an enum 
to index several common used command options, and ask node manager only pass 
index which can be matched with one of these enum elements, in this way we can 
have some kind of flexibility without open up bigger attack interface. 

3. This patch seems focus on running docker exec -it command to attach to a 
running container, but later on when the pipeline is been build, should we also 
take care of passing shell commands inside the container ?

 

> Container Executor C binary change to execute interactive docker command
> 
>
> Key: YARN-8777
> URL: https://issues.apache.org/jira/browse/YARN-8777
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zian Chen
>Assignee: Eric Yang
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8777.001.patch
>
>
> Since Container Executor provides Container execution using the native 
> container-executor binary, we also need to make changes to accept new 
> “dockerExec” method to invoke the corresponding native function to execute 
> docker exec command to the running container.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619558#comment-16619558
 ] 

Hadoop QA commented on YARN-8757:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
33s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine 
in trunk has 4 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine: 
The patch generated 8 new + 52 unchanged - 3 fixed = 60 total (was 55) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine 
generated 0 new + 2 unchanged - 2 fixed = 2 total (was 4) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
29s{color} | {color:green} hadoop-yarn-submarine in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 50m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8757 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940269/YARN-8757.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6e3853f8c8ee 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f938925 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/21865/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-submarine-warnings.html
 |
| checkstyle |

[jira] [Commented] (YARN-8767) TestStreamingStatus fails

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619536#comment-16619536
 ] 

Hadoop QA commented on YARN-8767:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 12s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
26s{color} | {color:green} hadoop-tools_hadoop-streaming generated 0 new + 78 
unchanged - 5 fixed = 78 total (was 83) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} hadoop-tools/hadoop-streaming: The patch 
generated 1 new + 49 unchanged - 16 fixed = 50 total (was 65) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  8s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  5m 
47s{color} | {color:green} hadoop-streaming in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8767 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940255/YARN-8767.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d58f9c7c208f 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f938925 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/21863/artifact/out/diff-checkstyle-hadoop-tools_hadoop-streaming.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/21863/testReport/ |
| Max. process+thread count | 750 (vs. ulimit of 1) |
| modules | C: hadoop-tools/hadoop-streaming U: hadoop-tools/hadoop-streaming |
| Console output |

[jira] [Commented] (YARN-8696) [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async

2018-09-18 Thread Giovanni Matteo Fumarola (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619511#comment-16619511
 ] 

Giovanni Matteo Fumarola commented on YARN-8696:


Thanks [~botong] . [^YARN-8696.v4.patch]looks good. 
Please rebase it.

> [AMRMProxy] FederationInterceptor upgrade: home sub-cluster heartbeat async
> ---
>
> Key: YARN-8696
> URL: https://issues.apache.org/jira/browse/YARN-8696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-8696.v1.patch, YARN-8696.v2.patch, 
> YARN-8696.v3.patch, YARN-8696.v4.patch
>
>
> Today in _FederationInterceptor_, the heartbeat to home sub-cluster is 
> synchronous. After the heartbeat is sent out to home sub-cluster, it waits 
> for the home response to come back before merging and returning the (merged) 
> heartbeat result to back AM. If home sub-cluster is suffering from connection 
> issues, or down during an YarnRM master-slave switch, all heartbeat threads 
> in _FederationInterceptor_ will be blocked waiting for home response. As a 
> result, the successful UAM heartbeats from secondary sub-clusters will not be 
> returned to AM at all. Additionally, because of the fact that we kept the 
> same heartbeat responseId between AM and home RM, lots of tricky handling are 
> needed regarding the responseId resync when it comes to 
> _FederationInterceptor_ (part of AMRMProxy, NM) work preserving restart 
> (YARN-6127, YARN-1336), home RM master-slave switch etc. 
> In this patch, we change the heartbeat to home sub-cluster to asynchronous, 
> same as the way we handle UAM heartbeats in secondaries. So that any 
> sub-cluster down or connection issues won't impact AM getting responses from 
> other sub-clusters. The responseId is also managed separately for home 
> sub-cluster and AM, and they increment independently. The resync logic 
> becomes much cleaner. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Reopened] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Jason Lowe (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reopened YARN-8786:
--

This should be left open to track the sporadic failure in creating directories. 
 YARN-8751 may make this sporadic problem not take out the NM, but it still 
causes the requested container launch to fail.  That should be fixed, and this 
JIRA can track that effort.

> LinuxContainerExecutor fails sporadically in create_local_dirs
> --
>
> Key: YARN-8786
> URL: https://issues.apache.org/jira/browse/YARN-8786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jon Bender
>Priority: Major
>
> We started using CGroups with LinuxContainerExecutor recently, running Apache 
> Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn 
> container will fail with a message like the following:
> {code:java}
> [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: 
> Container container_1530684675517_516620_01_020846 transitioned from 
> SCHEDULED to RUNNING
> [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO 
> monitor.ContainersMonitorImpl: Starting resource-monitoring for 
> container_1530684675517_516620_01_020846
> [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN 
> privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 
> 35. Privileged Execution Operation Stderr:
> [2018-09-02 23:48:02.506159] Could not create container dirsCould not create 
> local files and directories
> [2018-09-02 23:48:02.506220]
> [2018-09-02 23:48:02.506238] Stdout: main : command provided 1
> [2018-09-02 23:48:02.506258] main : run as user is nobody
> [2018-09-02 23:48:02.506282] main : requested yarn user is root
> [2018-09-02 23:48:02.506294] Getting exit code file...
> [2018-09-02 23:48:02.506307] Creating script paths...
> [2018-09-02 23:48:02.506330] Writing pid file...
> [2018-09-02 23:48:02.506366] Writing to tmp file 
> /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp
> [2018-09-02 23:48:02.506389] Writing to cgroup task files...
> [2018-09-02 23:48:02.506402] Creating local dirs...
> [2018-09-02 23:48:02.506414] Getting exit code file...
> [2018-09-02 23:48:02.506435] Creating script paths...
> {code}
> Looking at the container executor source it's traceable to errors here: 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604]
>  And ultimately to 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672]
> The root failure seems to be in the underlying mkdir call, but that exit code 
> / errno is swallowed so we don't have more details. We tend to see this when 
> many containers start at the same time for the same application on a host, 
> and suspect it may be related to some race conditions around those shared 
> directories between containers for the same application.
> For example, this is a typical pattern in the audit logs:
> {code:java}
> [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012870
> [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN 
> nodemanager.NMAuditLogger: USER=root  OPERATION=Container Finished - 
> Failed   TARGET=ContainerImplRESULT=FAILURE  DESCRIPTION=Container failed 
> with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> {code}
> Two containers for the same application starting in quick succession followed 
> by the EXITED_WITH_FAILURE step (exit code 35).
> We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, 
> the only major JIRAs that affected the executor since 3.0.0 seem unrelated 
> ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8]
>  and 
> [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (YARN-8725) Submarine job staging directory has a lot of useless PRIMARY_WORKER-launch-script-***.sh scripts when submitting a job multiple times

2018-09-18 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619480#comment-16619480
 ] 

Wangda Tan commented on YARN-8725:
--

[~tangzhankun], cleanup whole staging dir seems overkill because models, etc. 
by default is placed under the directory as well. 

And logics in your patch cleans up dirs after job submitted. It is possible 
that workers get launched after dir got deleted.

I'm not sure if we can do many meaningful things here in the client code. It 
might be better to do this in the server side, I don't have a clear idea about 
how to handle the service part. Maybe it should be a plugin of ApiServer, or it 
is a completely new service like a system service.

> Submarine job staging directory has a lot of useless 
> PRIMARY_WORKER-launch-script-***.sh  scripts when submitting a job multiple 
> times
> --
>
> Key: YARN-8725
> URL: https://issues.apache.org/jira/browse/YARN-8725
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zac Zhou
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8725-trunk.001.patch
>
>
> Submarine jobs upload core-site.xml, hdfs-site.xml, job.info and 
> PRIMARY_WORKER-launch-script.sh to staging dir.
> The core-site.xml, hdfs-site.xml and job.info would be overwritten if a job 
> is submitted multiple times.
> But PRIMARY_WORKER-launch-script.sh would not be overwritten, as it has 
> random numbers in its name.
> The files in the staging dir are as follows：
> {code:java}
> -rw-r- 2 hadoop hdfs 580 2018-08-17 10:11 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script6954941665090337726.sh
> -rw-r- 2 hadoop hdfs 580 2018-08-17 10:02 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script7037369696166769734.sh
> -rw-r- 2 hadoop hdfs 580 2018-08-17 10:06 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script8047707294763488040.sh
> -rw-r- 2 hadoop hdfs 15225 2018-08-17 18:46 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script8122565781159446375.sh
> -rw-r- 2 hadoop hdfs 580 2018-08-16 20:48 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script8598604480700049845.sh
> -rw-r- 2 hadoop hdfs 580 2018-08-17 14:53 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script971703616848859353.sh
> -rw-r- 2 hadoop hdfs 580 2018-08-17 10:16 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/PRIMARY_WORKER-launch-script990214235580089093.sh
> -rw-r- 2 hadoop hdfs 8815 2018-08-27 15:54 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/core-site.xml
> -rw-r- 2 hadoop hdfs 11583 2018-08-27 15:54 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/hdfs-site.xml
> -rw-rw-rw- 2 hadoop hdfs 846 2018-08-22 10:56 
> hdfs://submarine/user/hadoop/submarine/jobs/standlone-tf/staging/job.info
> {code}
>  
> We should stop the staging dir from growing or have a way to clean it up



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-18 Thread Wangda Tan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619469#comment-16619469
 ] 

Wangda Tan commented on YARN-8757:
--

Thanks [~sunilg], 

For 1. The default case is already handled by service spec's artifact. 

For 2. It is intentional, no issues here. 

For 3. I intentionally added this since only http(s) links is allowed. The 
quicklink is intentionally used to redirect in browser so I think it is fine. 

For 4. Updated .

Attaching ver.3 patch.

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8757.001.patch, YARN-8757.002.patch, 
> YARN-8757.003.patch
>
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8757) [Submarine] Add Tensorboard component when --tensorboard is specified

2018-09-18 Thread Wangda Tan (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-8757:
-
Attachment: YARN-8757.003.patch

> [Submarine] Add Tensorboard component when --tensorboard is specified
> -
>
> Key: YARN-8757
> URL: https://issues.apache.org/jira/browse/YARN-8757
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8757.001.patch, YARN-8757.002.patch, 
> YARN-8757.003.patch
>
>
> We need to have a Tensorboard component when --tensorboard is specified. And 
> we need to set quicklinks to let users view tensorboard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Jon Bender (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619451#comment-16619451
 ] 

Jon Bender commented on YARN-8786:
--

Ah, I figured we wanted to leave this open until the race conditions on the 
executor were resolved, but if we feel it's infrequent enough and the 
user-facing impact is minimal on 3.1.2+ I'm OK duping this into YARN-8751

> LinuxContainerExecutor fails sporadically in create_local_dirs
> --
>
> Key: YARN-8786
> URL: https://issues.apache.org/jira/browse/YARN-8786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jon Bender
>Priority: Major
>
> We started using CGroups with LinuxContainerExecutor recently, running Apache 
> Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn 
> container will fail with a message like the following:
> {code:java}
> [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: 
> Container container_1530684675517_516620_01_020846 transitioned from 
> SCHEDULED to RUNNING
> [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO 
> monitor.ContainersMonitorImpl: Starting resource-monitoring for 
> container_1530684675517_516620_01_020846
> [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN 
> privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 
> 35. Privileged Execution Operation Stderr:
> [2018-09-02 23:48:02.506159] Could not create container dirsCould not create 
> local files and directories
> [2018-09-02 23:48:02.506220]
> [2018-09-02 23:48:02.506238] Stdout: main : command provided 1
> [2018-09-02 23:48:02.506258] main : run as user is nobody
> [2018-09-02 23:48:02.506282] main : requested yarn user is root
> [2018-09-02 23:48:02.506294] Getting exit code file...
> [2018-09-02 23:48:02.506307] Creating script paths...
> [2018-09-02 23:48:02.506330] Writing pid file...
> [2018-09-02 23:48:02.506366] Writing to tmp file 
> /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp
> [2018-09-02 23:48:02.506389] Writing to cgroup task files...
> [2018-09-02 23:48:02.506402] Creating local dirs...
> [2018-09-02 23:48:02.506414] Getting exit code file...
> [2018-09-02 23:48:02.506435] Creating script paths...
> {code}
> Looking at the container executor source it's traceable to errors here: 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604]
>  And ultimately to 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672]
> The root failure seems to be in the underlying mkdir call, but that exit code 
> / errno is swallowed so we don't have more details. We tend to see this when 
> many containers start at the same time for the same application on a host, 
> and suspect it may be related to some race conditions around those shared 
> directories between containers for the same application.
> For example, this is a typical pattern in the audit logs:
> {code:java}
> [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012870
> [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN 
> nodemanager.NMAuditLogger: USER=root  OPERATION=Container Finished - 
> Failed   TARGET=ContainerImplRESULT=FAILURE  DESCRIPTION=Container failed 
> with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> {code}
> Two containers for the same application starting in quick succession followed 
> by the EXITED_WITH_FAILURE step (exit code 35).
> We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, 
> the only major JIRAs that affected the executor since 3.0.0 seem unrelated 
> ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8]
>  and 
> [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2018-09-18 Thread Botong Huang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619449#comment-16619449
 ] 

Botong Huang commented on YARN-7599:


bq. One concern is the response size
Yeah we've hit this exact same problem in our federated cluster. That's why I 
added "RMWSConsts.DESELECTS" (introduced in YARN-6280) when calling getApps to 
Router in _GPGUtils_. The _ResourceRequest_ entries occupy a significant 
portion of the appInfo response. In our prod cluster, this deselect reduces the 
getApps call latency from 20 mins to seconds. 

> [GPG] ApplicationCleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-7599-YARN-7402.v1.patch, 
> YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, 
> YARN-7599-YARN-7402.v4.patch
>
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records. For the 
> latter, failed and killed applications might leave records in the Yarn 
> Registry (see YARN-6128). We plan to do both cleanup work in 
> ApplicationCleaner in GPG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated YARN-8789:
--
Attachment: YARN-8789.1.patch

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
> Attachments: YARN-8789.1.patch
>
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR reassigned YARN-8789:
-

Assignee: BELUGA BEHR

> Add BoundedQueue to AsyncDispatcher
> ---
>
> Key: YARN-8789
> URL: https://issues.apache.org/jira/browse/YARN-8789
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 3.2.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Major
>
> I recently came across a scenario where an MR ApplicationMaster was failing 
> with an OOM exception.  It had many thousands of Mappers and thousands of 
> Reducers.  It was noted that in the logging that the event-queue of 
> {{AsyncDispatcher}} had a very large number of item in it and was seemingly 
> never decreasing.
> I started looking at the code and thought it could use some clean up, 
> simplification, and the ability to specify a bounded queue so that any 
> incoming events are throttled until they can be processed.  This will protect 
> the ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Eric Badger (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger resolved YARN-8786.
---
Resolution: Duplicate

Sounds good. Resolving this as a duplicate of YARN-8751

> LinuxContainerExecutor fails sporadically in create_local_dirs
> --
>
> Key: YARN-8786
> URL: https://issues.apache.org/jira/browse/YARN-8786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jon Bender
>Priority: Major
>
> We started using CGroups with LinuxContainerExecutor recently, running Apache 
> Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn 
> container will fail with a message like the following:
> {code:java}
> [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: 
> Container container_1530684675517_516620_01_020846 transitioned from 
> SCHEDULED to RUNNING
> [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO 
> monitor.ContainersMonitorImpl: Starting resource-monitoring for 
> container_1530684675517_516620_01_020846
> [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN 
> privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 
> 35. Privileged Execution Operation Stderr:
> [2018-09-02 23:48:02.506159] Could not create container dirsCould not create 
> local files and directories
> [2018-09-02 23:48:02.506220]
> [2018-09-02 23:48:02.506238] Stdout: main : command provided 1
> [2018-09-02 23:48:02.506258] main : run as user is nobody
> [2018-09-02 23:48:02.506282] main : requested yarn user is root
> [2018-09-02 23:48:02.506294] Getting exit code file...
> [2018-09-02 23:48:02.506307] Creating script paths...
> [2018-09-02 23:48:02.506330] Writing pid file...
> [2018-09-02 23:48:02.506366] Writing to tmp file 
> /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp
> [2018-09-02 23:48:02.506389] Writing to cgroup task files...
> [2018-09-02 23:48:02.506402] Creating local dirs...
> [2018-09-02 23:48:02.506414] Getting exit code file...
> [2018-09-02 23:48:02.506435] Creating script paths...
> {code}
> Looking at the container executor source it's traceable to errors here: 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604]
>  And ultimately to 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672]
> The root failure seems to be in the underlying mkdir call, but that exit code 
> / errno is swallowed so we don't have more details. We tend to see this when 
> many containers start at the same time for the same application on a host, 
> and suspect it may be related to some race conditions around those shared 
> directories between containers for the same application.
> For example, this is a typical pattern in the audit logs:
> {code:java}
> [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012870
> [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN 
> nodemanager.NMAuditLogger: USER=root  OPERATION=Container Finished - 
> Failed   TARGET=ContainerImplRESULT=FAILURE  DESCRIPTION=Container failed 
> with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> {code}
> Two containers for the same application starting in quick succession followed 
> by the EXITED_WITH_FAILURE step (exit code 35).
> We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, 
> the only major JIRAs that affected the executor since 3.0.0 seem unrelated 
> ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8]
>  and 
> [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Assigned] (YARN-1013) CS should watch resource utilization of containers and allocate speculative containers if appropriate

2018-09-18 Thread Arun Suresh (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reassigned YARN-1013:
-

Assignee: Arun Suresh  (was: Weiwei Yang)

> CS should watch resource utilization of containers and allocate speculative 
> containers if appropriate
> -
>
> Key: YARN-1013
> URL: https://issues.apache.org/jira/browse/YARN-1013
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Arun C Murthy
>Assignee: Arun Suresh
>Priority: Major
>
> CS should watch resource utilization of containers (provided by NM in 
> heartbeat) and allocate speculative containers (at lower OS priority) if 
> appropriate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8767) TestStreamingStatus fails

2018-09-18 Thread Andras Bokor (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Bokor updated YARN-8767:
---
Attachment: YARN-8767.003.patch

> TestStreamingStatus fails
> -
>
> Key: YARN-8767
> URL: https://issues.apache.org/jira/browse/YARN-8767
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Andras Bokor
>Assignee: Andras Bokor
>Priority: Major
> Attachments: YARN-8767.001.patch, YARN-8767.002.patch, 
> YARN-8767.003.patch
>
>
> The test tries to connect to RM through 0.0.0.0:8032, but it cannot.
> On the console I see the following error message:
> {code}Your endpoint configuration is wrong; For more details see:  
> http://wiki.apache.org/hadoop/UnsetHostnameOrPort, while invoking 
> ApplicationClientProtocolPBClientImpl.getNewApplication over null after 1 
> failover attempts. Trying to failover after sleeping for 44892ms.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Jon Bender (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619361#comment-16619361
 ] 

Jon Bender commented on YARN-8786:
--

Yep, YARN-8751 should definitely lessen the severity of the issue for us and I 
wouldn't be as concerned about this bug, I just wanted to file it with all the 
info I had for tracking purposes. I plan to upgrade us to 3.1.2+ anyway in the 
coming weeks so I don't expect a backport will be necessary for our purposes 
anyway.

> LinuxContainerExecutor fails sporadically in create_local_dirs
> --
>
> Key: YARN-8786
> URL: https://issues.apache.org/jira/browse/YARN-8786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jon Bender
>Priority: Major
>
> We started using CGroups with LinuxContainerExecutor recently, running Apache 
> Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn 
> container will fail with a message like the following:
> {code:java}
> [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: 
> Container container_1530684675517_516620_01_020846 transitioned from 
> SCHEDULED to RUNNING
> [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO 
> monitor.ContainersMonitorImpl: Starting resource-monitoring for 
> container_1530684675517_516620_01_020846
> [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN 
> privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 
> 35. Privileged Execution Operation Stderr:
> [2018-09-02 23:48:02.506159] Could not create container dirsCould not create 
> local files and directories
> [2018-09-02 23:48:02.506220]
> [2018-09-02 23:48:02.506238] Stdout: main : command provided 1
> [2018-09-02 23:48:02.506258] main : run as user is nobody
> [2018-09-02 23:48:02.506282] main : requested yarn user is root
> [2018-09-02 23:48:02.506294] Getting exit code file...
> [2018-09-02 23:48:02.506307] Creating script paths...
> [2018-09-02 23:48:02.506330] Writing pid file...
> [2018-09-02 23:48:02.506366] Writing to tmp file 
> /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp
> [2018-09-02 23:48:02.506389] Writing to cgroup task files...
> [2018-09-02 23:48:02.506402] Creating local dirs...
> [2018-09-02 23:48:02.506414] Getting exit code file...
> [2018-09-02 23:48:02.506435] Creating script paths...
> {code}
> Looking at the container executor source it's traceable to errors here: 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604]
>  And ultimately to 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672]
> The root failure seems to be in the underlying mkdir call, but that exit code 
> / errno is swallowed so we don't have more details. We tend to see this when 
> many containers start at the same time for the same application on a host, 
> and suspect it may be related to some race conditions around those shared 
> directories between containers for the same application.
> For example, this is a typical pattern in the audit logs:
> {code:java}
> [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012870
> [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN 
> nodemanager.NMAuditLogger: USER=root  OPERATION=Container Finished - 
> Failed   TARGET=ContainerImplRESULT=FAILURE  DESCRIPTION=Container failed 
> with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> {code}
> Two containers for the same application starting in quick succession followed 
> by the EXITED_WITH_FAILURE step (exit code 35).
> We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, 
> the only major JIRAs that affected the executor since 3.0.0 seem unrelated 
> ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8]
>  and 
> [https://github.com/apache/hadoop/commit/a82be7754d74f4d16b206427b91e700bb5f44d56])



--
This message was

[jira] [Comment Edited] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Jon Bender (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619361#comment-16619361
 ] 

Jon Bender edited comment on YARN-8786 at 9/18/18 4:28 PM:
---

Yep, YARN-8751 should definitely lessen the severity of the issue for us and I 
wouldn't be as concerned about this bug, I just wanted to file it with all the 
info I had for tracking purposes. I plan to upgrade us to 3.1.2+ anyway in the 
coming weeks so I don't expect a backport will be necessary for our purposes.


was (Author: jonbender-stripe):
Yep, YARN-8751 should definitely lessen the severity of the issue for us and I 
wouldn't be as concerned about this bug, I just wanted to file it with all the 
info I had for tracking purposes. I plan to upgrade us to 3.1.2+ anyway in the 
coming weeks so I don't expect a backport will be necessary for our purposes 
anyway.

> LinuxContainerExecutor fails sporadically in create_local_dirs
> --
>
> Key: YARN-8786
> URL: https://issues.apache.org/jira/browse/YARN-8786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jon Bender
>Priority: Major
>
> We started using CGroups with LinuxContainerExecutor recently, running Apache 
> Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn 
> container will fail with a message like the following:
> {code:java}
> [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: 
> Container container_1530684675517_516620_01_020846 transitioned from 
> SCHEDULED to RUNNING
> [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO 
> monitor.ContainersMonitorImpl: Starting resource-monitoring for 
> container_1530684675517_516620_01_020846
> [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN 
> privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 
> 35. Privileged Execution Operation Stderr:
> [2018-09-02 23:48:02.506159] Could not create container dirsCould not create 
> local files and directories
> [2018-09-02 23:48:02.506220]
> [2018-09-02 23:48:02.506238] Stdout: main : command provided 1
> [2018-09-02 23:48:02.506258] main : run as user is nobody
> [2018-09-02 23:48:02.506282] main : requested yarn user is root
> [2018-09-02 23:48:02.506294] Getting exit code file...
> [2018-09-02 23:48:02.506307] Creating script paths...
> [2018-09-02 23:48:02.506330] Writing pid file...
> [2018-09-02 23:48:02.506366] Writing to tmp file 
> /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp
> [2018-09-02 23:48:02.506389] Writing to cgroup task files...
> [2018-09-02 23:48:02.506402] Creating local dirs...
> [2018-09-02 23:48:02.506414] Getting exit code file...
> [2018-09-02 23:48:02.506435] Creating script paths...
> {code}
> Looking at the container executor source it's traceable to errors here: 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604]
>  And ultimately to 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672]
> The root failure seems to be in the underlying mkdir call, but that exit code 
> / errno is swallowed so we don't have more details. We tend to see this when 
> many containers start at the same time for the same application on a host, 
> and suspect it may be related to some race conditions around those shared 
> directories between containers for the same application.
> For example, this is a typical pattern in the audit logs:
> {code:java}
> [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012870
> [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN 
> nodemanager.NMAuditLogger: USER=root  OPERATION=Container Finished - 
> Failed   TARGET=ContainerImplRESULT=FAILURE  DESCRIPTION=Container failed 
> with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> {code}
> Two containers for the same application starting in quick succession followed 
> by the

[jira] [Created] (YARN-8789) Add BoundedQueue to AsyncDispatcher

2018-09-18 Thread BELUGA BEHR (JIRA)

BELUGA BEHR created YARN-8789:
-

 Summary: Add BoundedQueue to AsyncDispatcher
 Key: YARN-8789
 URL: https://issues.apache.org/jira/browse/YARN-8789
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Affects Versions: 3.2.0
Reporter: BELUGA BEHR


I recently came across a scenario where an MR ApplicationMaster was failing 
with an OOM exception.  It had many thousands of Mappers and thousands of 
Reducers.  It was noted that in the logging that the event-queue of 
{{AsyncDispatcher}} had a very large number of item in it and was seemingly 
never decreasing.

I started looking at the code and thought it could use some clean up, 
simplification, and the ability to specify a bounded queue so that any incoming 
events are throttled until they can be processed.  This will protect the 
ApplicationMaster from a flood of events.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-7599) [GPG] ApplicationCleaner in Global Policy Generator

2018-09-18 Thread Bibin A Chundatt (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-7599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618519#comment-16618519
 ] 

Bibin A Chundatt edited comment on YARN-7599 at 9/18/18 3:30 PM:
-

Thank you [~botong] for updated patch

{quote}
I should have excluded all apps that are still in YarnRM memory. That should 
eliminate the race condition you mentioned. What do you think?
{quote}
Yes , I agree with the same for fixing race.

One concern is the response size. Since all applications are fetched from 
overall cluster.
we could be fetching at worst case 5k x subclusters applications.


was (Author: bibinchundatt):
Thank you [~botong] for updated patch

{quote}
I should have excluded all apps that are still in YarnRM memory. That should 
eliminate the race condition you mentioned. What do you think?
{quote}
Yes , I aggree with the same.

> [GPG] ApplicationCleaner in Global Policy Generator
> ---
>
> Key: YARN-7599
> URL: https://issues.apache.org/jira/browse/YARN-7599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Minor
>  Labels: federation, gpg
> Attachments: YARN-7599-YARN-7402.v1.patch, 
> YARN-7599-YARN-7402.v2.patch, YARN-7599-YARN-7402.v3.patch, 
> YARN-7599-YARN-7402.v4.patch
>
>
> In Federation, we need a cleanup service for StateStore as well as Yarn 
> Registry. For the former, we need to remove old application records. For the 
> latter, failed and killed applications might leave records in the Yarn 
> Registry (see YARN-6128). We plan to do both cleanup work in 
> ApplicationCleaner in GPG



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8635) Container Resource localization fails if umask is 077

2018-09-18 Thread Jason Lowe (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619284#comment-16619284
 ] 

Jason Lowe commented on YARN-8635:
--

Thanks for the patch!

It would be nice to have a short comment explaining why the umask setting is 
necessary, as it will not be obvious to many why it's there.

There should also be a unit test so we don't accidentally regress this fix.  It 
should be easy to extend the existing test_init_app test in 
test-container-executor.


> Container Resource localization fails if umask is 077
> -
>
> Key: YARN-8635
> URL: https://issues.apache.org/jira/browse/YARN-8635
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: YARN-8635-001.patch
>
>
> {code}
> java.io.IOException: Application application_1533652359071_0001 
> initialization failed (exitCode=255) with output: main : command provided 0
> main : run as user is mapred
> main : requested yarn user is mapred
> Path 
> /opt/HA/OSBR310/nmlocal/usercache/mapred/appcache/application_1533652359071_0001
>  has permission 700 but needs permission 750.
> Did not create any app directories
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:411)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
> Caused by: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
> ... 1 more
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 2 more
> 2018-08-08 17:43:26,918 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e04_1533652359071_0001_01_27 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2018-08-08 17:43:26,916 WARN 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code 
> from container container_e04_1533652359071_0001_01_31 startLocalizer is : 
> 255
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationException:
>  ExitCodeException exitCode=255:
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:180)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:402)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1229)
> Caused by: ExitCodeException exitCode=255:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:1009)
> at org.apache.hadoop.util.Shell.run(Shell.java:902)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:152)
> ... 2 more
> 2018-08-08 17:43:26,923 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed for containe
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8786) LinuxContainerExecutor fails sporadically in create_local_dirs

2018-09-18 Thread Eric Badger (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619248#comment-16619248
 ] 

Eric Badger commented on YARN-8786:
---

As [~shaneku...@gmail.com] said on the mailing list, this should be fixed in 
3.1.2 by YARN-8751. The container-executor will continue to report the same 
error, but the Nodemanager will not mark itself as unhealthy if it receives an 
error code that corresponds to not being able to create directories. We could 
pull the patch back to 3.0.x if you think it's necessary. 

> LinuxContainerExecutor fails sporadically in create_local_dirs
> --
>
> Key: YARN-8786
> URL: https://issues.apache.org/jira/browse/YARN-8786
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jon Bender
>Priority: Major
>
> We started using CGroups with LinuxContainerExecutor recently, running Apache 
> Hadoop 3.0.0. Occasionally (once out of many millions of tasks) a yarn 
> container will fail with a message like the following:
> {code:java}
> [2018-09-02 23:48:02.458691] 18/09/02 23:48:02 INFO container.ContainerImpl: 
> Container container_1530684675517_516620_01_020846 transitioned from 
> SCHEDULED to RUNNING
> [2018-09-02 23:48:02.458874] 18/09/02 23:48:02 INFO 
> monitor.ContainersMonitorImpl: Starting resource-monitoring for 
> container_1530684675517_516620_01_020846
> [2018-09-02 23:48:02.506114] 18/09/02 23:48:02 WARN 
> privileged.PrivilegedOperationExecutor: Shell execution returned exit code: 
> 35. Privileged Execution Operation Stderr:
> [2018-09-02 23:48:02.506159] Could not create container dirsCould not create 
> local files and directories
> [2018-09-02 23:48:02.506220]
> [2018-09-02 23:48:02.506238] Stdout: main : command provided 1
> [2018-09-02 23:48:02.506258] main : run as user is nobody
> [2018-09-02 23:48:02.506282] main : requested yarn user is root
> [2018-09-02 23:48:02.506294] Getting exit code file...
> [2018-09-02 23:48:02.506307] Creating script paths...
> [2018-09-02 23:48:02.506330] Writing pid file...
> [2018-09-02 23:48:02.506366] Writing to tmp file 
> /path/to/hadoop/yarn/local/nmPrivate/application_1530684675517_516620/container_1530684675517_516620_01_020846/container_1530684675517_516620_01_020846.pid.tmp
> [2018-09-02 23:48:02.506389] Writing to cgroup task files...
> [2018-09-02 23:48:02.506402] Creating local dirs...
> [2018-09-02 23:48:02.506414] Getting exit code file...
> [2018-09-02 23:48:02.506435] Creating script paths...
> {code}
> Looking at the container executor source it's traceable to errors here: 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604]
>  And ultimately to 
> [https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672]
> The root failure seems to be in the underlying mkdir call, but that exit code 
> / errno is swallowed so we don't have more details. We tend to see this when 
> many containers start at the same time for the same application on a host, 
> and suspect it may be related to some race conditions around those shared 
> directories between containers for the same application.
> For example, this is a typical pattern in the audit logs:
> {code:java}
> [2018-09-07 17:16:38.447654] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> [2018-09-07 17:16:38.492298] 18/09/07 17:16:38 INFO 
> nodemanager.NMAuditLogger: USER=root  IP=<> Container Request 
> TARGET=ContainerManageImpl  RESULT=SUCCESS  
> APPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012870
> [2018-09-07 17:16:38.614044] 18/09/07 17:16:38 WARN 
> nodemanager.NMAuditLogger: USER=root  OPERATION=Container Finished - 
> Failed   TARGET=ContainerImplRESULT=FAILURE  DESCRIPTION=Container failed 
> with state: EXITED_WITH_FAILUREAPPID=application_1530684675517_559126  
> CONTAINERID=container_1530684675517_559126_01_012871
> {code}
> Two containers for the same application starting in quick succession followed 
> by the EXITED_WITH_FAILURE step (exit code 35).
> We plan to upgrade to 3.1.x soon but I don't expect this to be fixed by this, 
> the only major JIRAs that affected the executor since 3.0.0 seem unrelated 
> ([https://github.com/apache/hadoop/commit/bc285da107bb84a3c60c5224369d7398a41db2d8]
>  and 
>

[jira] [Comment Edited] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-09-18 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619232#comment-16619232
 ] 

Antal Bálint Steinbach edited comment on YARN-8553 at 9/18/18 2:59 PM:
---

Hi [~snemeth] ,

Thanks for creating the patch. Your changes cleaning the code for sure.

I found some minor things to consider:

1) I would set the boolean fields default value in 
_ApplicationsRequestBuilderCommons_ to false. It is more straightforward in 
that way.
{code:java}
private boolean startedTimeSpecified;
private boolean finishedTimeSpecified;
private boolean appStatesSpecified;
private boolean appTypesSpecified;{code}
2) in _ApplicationsRequestBuilderCommons_ 
{code:java}
Set appStates = ApplicationsRequestValueParser
.parseApplicationStates(this.appStates);
{code}
appStates is already a field in the class.

3) _TestApplicationsRequestBuilderCommons_ it would be easier to maintain and 
read if you don't check all the "builder setters" one by one. In my opinion, it 
is enough to keep the one which tests all of them together, plus the ones which 
are testing edge cases or invalid values.


was (Author: bsteinbach):
Hi [~snemeth] ,

Thanks for creating the patch. Your changes cleaning the code for sure.

I found some minor things to consider:

1) I would set the boolean fields default value in 
_ApplicationsRequestBuilderCommons_ to false. It is more straightforward in 
that way.
{code:java}
private boolean startedTimeSpecified;
private boolean finishedTimeSpecified;
private boolean appStatesSpecified;
private boolean appTypesSpecified;{code}
2) in _ApplicationsRequestBuilderCommons_ 
{code:java}
// Set appStates = ApplicationsRequestValueParser
.parseApplicationStates(this.appStates);
{code}
appStates is already a field in the class.

3) _TestApplicationsRequestBuilderCommons_ it would be easier to maintain and 
read if you don't check all the "builder setters" one by one. In my opinion, it 
is enough to keep the one which tests all of them together, plus the ones which 
are testing edge cases or invalid values.

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch, YARN-8553.001.patch, 
> YARN-8553.002.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8553) Reduce complexity of AHSWebService getApps method

2018-09-18 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619232#comment-16619232
 ] 

Antal Bálint Steinbach commented on YARN-8553:
--

Hi [~snemeth] ,

Thanks for creating the patch. Your changes cleaning the code for sure.

I found some minor things to consider:

1) I would set the boolean fields default value in 
_ApplicationsRequestBuilderCommons_ to false. It is more straightforward in 
that way.
{code:java}
private boolean startedTimeSpecified;
private boolean finishedTimeSpecified;
private boolean appStatesSpecified;
private boolean appTypesSpecified;{code}
2) in _ApplicationsRequestBuilderCommons_ 
{code:java}
// Set appStates = ApplicationsRequestValueParser
.parseApplicationStates(this.appStates);
{code}
appStates is already a field in the class.

3) _TestApplicationsRequestBuilderCommons_ it would be easier to maintain and 
read if you don't check all the "builder setters" one by one. In my opinion, it 
is enough to keep the one which tests all of them together, plus the ones which 
are testing edge cases or invalid values.

> Reduce complexity of AHSWebService getApps method
> -
>
> Key: YARN-8553
> URL: https://issues.apache.org/jira/browse/YARN-8553
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8553.001.patch, YARN-8553.001.patch, 
> YARN-8553.002.patch
>
>
> YARN-8501 refactor the RMWebService#getApp. Similar refactoring required in 
> AHSWebservice. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619208#comment-16619208
 ] 

Weiwei Yang commented on YARN-8771:
---

OK, if that's the case, I am fine with the patch.

LGTM, +1, if we don't get any further comments, I will commit the patch by 
tomorrow.

Thanks [~Tao Yang].

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8771) CapacityScheduler fails to unreserve when cluster resource contains empty resource type

2018-09-18 Thread Tao Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619181#comment-16619181
 ] 

Tao Yang commented on YARN-8771:


Thanks [~cheersyang] for the review.
{quote}
Instead of adding a new "resource-types-1.xml", can we use 
TestResourceUtils#addNewTypesToResources for the tests? I think it doesn't 
matter to test with existing gpu or fpga resource correct?
{quote}
IIUC, MockRM will reset resource types and reload resource-types.xml 
internally, without resource-types.xml, MockRM will only have two resource 
types (memory and vcores), so that we can't simulate that cluster contains 
empty resource type. Thoughts?

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---
>
> Key: YARN-8771
> URL: https://issues.apache.org/jira/browse/YARN-8771
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Critical
> Attachments: YARN-8771.001.patch, YARN-8771.002.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
> // How much need to unreserve equals to:
> // max(required - headroom, amountNeedUnreserve)
> Resource headRoom = Resources.clone(currentResoureLimits.getHeadroom());
> Resource resourceNeedToUnReserve =
> Resources.max(rc, clusterResource,
> Resources.subtract(capability, headRoom),
> currentResoureLimits.getAmountNeededUnreserve());
> boolean needToUnreserve =
> Resources.greaterThan(rc, clusterResource,
> resourceNeedToUnReserve, Resources.none());
> {code}
> For example, resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu> when 
> {{headRoom=<0GB, 8 vcores, 0 gpu>}} and {{capacity=<8GB, 2 vcores, 0 gpu>}}, 
> needToUnreserve which is the result of {{Resources#greaterThan}} will be 
> {{false}}. This is not reasonable because required resource did exceed the 
> headroom and unreserve is needed.
> After that, when reaching the unreserve process in 
> RegularContainerAllocator#assignContainer, unreserve process will be skipped 
> when shouldAllocOrReserveNewContainer is true (when required containers > 
> reserved containers) and needToUnreserve is wrongly calculated to be false:
> {code:java}
> if (availableContainers > 0) {
>  if (rmContainer == null && reservationsContinueLooking
>   && node.getLabels().isEmpty()) {
>   // unreserve process can be wrongly skipped when 
> shouldAllocOrReserveNewContainer=true and needToUnreserve=false but required 
> resource did exceed the headroom
>   if (!shouldAllocOrReserveNewContainer || needToUnreserve) { 
> ... 
>   }
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619110#comment-16619110
 ] 

Hadoop QA commented on YARN-8468:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 17 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m  5s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
4s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
21s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 18 new + 936 unchanged - 24 fixed = 954 total (was 960) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 73m 
41s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
15s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
31s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}144m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8468 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940195/YARN-8468.013.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs

[jira] [Commented] (YARN-8750) Refactor TestQueueMetrics

2018-09-18 Thread Zoltan Siegl (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619061#comment-16619061
 ] 

Zoltan Siegl commented on YARN-8750:


Thanks for the patch, +1 LGTM (non-binding)

> Refactor TestQueueMetrics
> -
>
> Key: YARN-8750
> URL: https://issues.apache.org/jira/browse/YARN-8750
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8750.001.patch, YARN-8750.002.patch, 
> YARN-8750.003.patch
>
>
> {{TestQueueMetrics#checkApps}} and {{TestQueueMetrics#checkResources}} have 8 
> and 14 parameters, respectively.
> It is very hard to read the testcases that are using these methods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8783) Improve the documentation for the docker.trusted.registries configuration

2018-09-18 Thread Shane Kumpf (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Kumpf updated YARN-8783:
--
Summary: Improve the documentation for the docker.trusted.registries 
configuration  (was: Property docker.trusted.registries does not work when 
using a list)

> Improve the documentation for the docker.trusted.registries configuration
> -
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-09-18 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618935#comment-16618935
 ] 

Antal Bálint Steinbach commented on YARN-8468:
--

Hi [~cheersyang] ,

Thanks for the feedback.

I rebased the patch to the current trunk and merged the conflicts.

1) Fixed a lot of related checkstyle issues

2) Added a null check for appAttempt, queueName can be null. If it is null than 
 _scheduler.getMaximumResourceCapability()_ will be called by default in the 
schedulers.

 

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8468) Limit container sizes per queue in FairScheduler

2018-09-18 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/YARN-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8468:
-
Attachment: YARN-8468.013.patch

> Limit container sizes per queue in FairScheduler
> 
>
> Key: YARN-8468
> URL: https://issues.apache.org/jira/browse/YARN-8468
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Critical
> Attachments: YARN-8468.000.patch, YARN-8468.001.patch, 
> YARN-8468.002.patch, YARN-8468.003.patch, YARN-8468.004.patch, 
> YARN-8468.005.patch, YARN-8468.006.patch, YARN-8468.007.patch, 
> YARN-8468.008.patch, YARN-8468.009.patch, YARN-8468.010.patch, 
> YARN-8468.011.patch, YARN-8468.012.patch, YARN-8468.013.patch
>
>
> When using any scheduler, you can use "yarn.scheduler.maximum-allocation-mb" 
> to limit the overall size of a container. This applies globally to all 
> containers and cannot be limited by queue or and is not scheduler dependent.
>  
> The goal of this ticket is to allow this value to be set on a per queue basis.
>  
> The use case: User has two pools, one for ad hoc jobs and one for enterprise 
> apps. User wants to limit ad hoc jobs to small containers but allow 
> enterprise apps to request as many resources as needed. Setting 
> yarn.scheduler.maximum-allocation-mb sets a default value for maximum 
> container size for all queues and setting maximum resources per queue with 
> “maxContainerResources” queue config value.
>  
> Suggested solution:
>  
> All the infrastructure is already in the code. We need to do the following:
>  * add the setting to the queue properties for all queue types (parent and 
> leaf), this will cover dynamically created queues.
>  * if we set it on the root we override the scheduler setting and we should 
> not allow that.
>  * make sure that queue resource cap can not be larger than scheduler max 
> resource cap in the config.
>  * implement getMaximumResourceCapability(String queueName) in the 
> FairScheduler
>  * implement getMaximumResourceCapability() in both FSParentQueue and 
> FSLeafQueue as follows
>  * expose the setting in the queue information in the RM web UI.
>  * expose the setting in the metrics etc for the queue.
>  * write JUnit tests.
>  * update the scheduler documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8749) Restrict job submission to queue based on apptype

2018-09-18 Thread Oleksandr Shevchenko (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oleksandr Shevchenko updated YARN-8749:
---
Attachment: YARN-8749.001.patch

> Restrict job submission to queue based on apptype
> -
>
> Key: YARN-8749
> URL: https://issues.apache.org/jira/browse/YARN-8749
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: RM, scheduler
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: YARN-8749.001.patch
>
>
> The proposed possibility here is adding a new property for queue tuning to 
> allow submit an application to queue only with the allowed types. If an 
> application has a different type from queue allowed types, the application 
> should be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8749) Restrict job submission to queue based on apptype

2018-09-18 Thread Oleksandr Shevchenko (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618929#comment-16618929
 ] 

Oleksandr Shevchenko commented on YARN-8749:


Attached patch to demonstrate the approach. [^YARN-8749.001.patch]
Could someone evaluate this feature and approach?
Thanks.

> Restrict job submission to queue based on apptype
> -
>
> Key: YARN-8749
> URL: https://issues.apache.org/jira/browse/YARN-8749
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: RM, scheduler
>Reporter: Oleksandr Shevchenko
>Assignee: Oleksandr Shevchenko
>Priority: Minor
> Attachments: YARN-8749.001.patch
>
>
> The proposed possibility here is adding a new property for queue tuning to 
> allow submit an application to queue only with the allowed types. If an 
> application has a different type from queue allowed types, the application 
> should be rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618910#comment-16618910
 ] 

Hadoop QA commented on YARN-8775:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: 
The patch generated 0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
7s{color} | {color:green} hadoop-yarn-server-tests in the patch passed. {color} 
|
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 46m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8775 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940176/YARN-8775.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 038b6e61f4b1 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f796cfd |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
|  Test Results |

[jira] [Comment Edited] (YARN-8783) Property docker.trusted.registries does not work when using a list

2018-09-18 Thread Simon Prewo (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618872#comment-16618872
 ] 

Simon Prewo edited comment on YARN-8783 at 9/18/18 10:35 AM:
-

[~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support.

[~shaneku...@gmail.com]: I tried the library-approach and it works. However, I 
stayed with tagging for now.

Let me summarize for others (findings this on Google) what helped:

1) Pull centos image and tag it:
{code:java}
docker pull centos && docker tag centos local/centos{code}
2) Add _local_ repository to docker.trusted.registries in container-executor.cfg
{code:java}
[docker]
  ...
  docker.trusted.registries=local 
{code}
3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case 
local/centos). I.e. execute distributed shell like this:
{code:java}
yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos -shell_command "sleep 90" -jar 
hadoop-yarn-applications-distributedshell.jar -num_containers 1 {code}


was (Author: simonprewo):
[~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support.

[~shaneku...@gmail.com]: I tried the library-approach and it works. However, I 
stayed with tagging for now.

Let me summarize for others (findings this on Google) what helped:

1) Pull centos image and tag it:
{code:java}
docker pull centos && docker tag centos local/centos:latest{code}
2) Add _local_ repository to docker.trusted.registries in container-executor.cfg
{code:java}
[docker]
  ...
  docker.trusted.registries=local 
{code}
3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case 
local/centos:latest). I.e. execute distributed shell like this:
{code:java}
yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 
90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1 {code}

> Property docker.trusted.registries does not work when using a list
> --
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8653) Wrong display of resources when cluster resources are less than min resources

2018-09-18 Thread Jinjiang Ling (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618876#comment-16618876
 ] 

Jinjiang Ling commented on YARN-8653:
-

Hi [~haibochen], could you help to review this ?

> Wrong display of resources when cluster resources are less than min resources
> -
>
> Key: YARN-8653
> URL: https://issues.apache.org/jira/browse/YARN-8653
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Jinjiang Ling
>Assignee: Jinjiang Ling
>Priority: Major
> Attachments: YARN-8653.001.patch, YARN-8653.001.patch, 
> wrong_resource_in_fairscheduler.JPG
>
>
> If the cluster resources are less the min resources of Fair Scheduler, a 
> display error will happened like this.
>  
> !wrong_resource_in_fairscheduler.JPG!
> In this case, I config my queue with max resource to 48 vcores, 49152 MB and 
> min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 
> vcores and 24576 MB. Then the max resource are fixed to the cluster 
> resources, but the min resources and steady fair share are still the config 
> value.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-8783) Property docker.trusted.registries does not work when using a list

2018-09-18 Thread Simon Prewo (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618872#comment-16618872
 ] 

Simon Prewo edited comment on YARN-8783 at 9/18/18 10:26 AM:
-

[~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support.

[~shaneku...@gmail.com]: I tried the library-approach and it works. However, I 
stayed with tagging for now.

Let me summarize for others (findings this on Google) what helped:

1) Pull centos image and tag it:
{code:java}
docker pull centos && docker tag centos local/centos:latest{code}
2) Add _local_ repository to docker.trusted.registries in container-executor.cfg
{code:java}
[docker]
  ...
  docker.trusted.registries=local 
{code}
3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case 
local/centos:latest). I.e. execute distributed shell like this:
{code:java}
yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 
90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1 {code}


was (Author: simonprewo):
[~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support.

Let me summarize for others (findings this on Google) what helped:

1) Pull centos image and tag it:

 
{code:java}
docker pull centos && docker tag centos local/centos:latest{code}
2) Add _local_ repository to docker.trusted.registries in container-executor.cfg

 
{code:java}
[docker]
  ...
  docker.trusted.registries=local 
{code}
3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case 
local/centos:latest). I.e. execute distributed shell like this:

 
{code:java}
yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 
90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
 

 

> Property docker.trusted.registries does not work when using a list
> --
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8783) Property docker.trusted.registries does not work when using a list

2018-09-18 Thread Simon Prewo (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618872#comment-16618872
 ] 

Simon Prewo commented on YARN-8783:
---

[~shaneku...@gmail.com], [~eyang] Thanks a lot for you great support.

Let me summarize for others (findings this on Google) what helped:

1) Pull centos image and tag it:

 
{code:java}
docker pull centos && docker tag centos local/centos:latest{code}
2) Add _local_ repository to docker.trusted.registries in container-executor.cfg

 
{code:java}
[docker]
  ...
  docker.trusted.registries=local 
{code}
3) Set YARN_CONTAINER_RUNTIME_DOCKER_IMAGE to the name of your tag (in our case 
local/centos:latest). I.e. execute distributed shell like this:

 
{code:java}
yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=local/centos:latest -shell_command "sleep 
90" -jar hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
 

 

> Property docker.trusted.registries does not work when using a list
> --
>
> Key: YARN-8783
> URL: https://issues.apache.org/jira/browse/YARN-8783
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: Simon Prewo
>Priority: Major
>  Labels: Docker, container-executor, docker
>
> I am deploying the default yarn distributed shell example:
> {code:java}
> yarn jar hadoop-yarn-applications-distributedshell.jar -shell_env 
> YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env 
> YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos -shell_command "sleep 90" -jar 
> hadoop-yarn-applications-distributedshell.jar -num_containers 1{code}
> Having a *single trusted registry configured like this works*:
> {code:java}
> docker.trusted.registries=centos{code}
> But having *a list of trusted registries configured fails* ("Shell error 
> output: image: centos is not trusted."):
> {code:java}
> docker.trusted.registries=centos,ubuntu{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints documentation

2018-09-18 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618810#comment-16618810
 ] 

Hudson commented on YARN-8787:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14992 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14992/])
YARN-8787. Fix broken list items in PlacementConstraints documentation. (wwei: 
rev 78a0d173e4f0c2f2679a04edd62a60fb76dde4f0)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm


> Fix broken list items in PlacementConstraints documentation
> ---
>
> Key: YARN-8787
> URL: https://issues.apache.org/jira/browse/YARN-8787
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, 
> listitems1.PNG
>
>
> It looks like some parts of the document below should be list items.
> https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
> It might be because of missing newlines before listing.
> https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications

2018-09-18 Thread JIRA



 [ 
https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antal Bálint Steinbach updated YARN-8775:
-
Attachment: YARN-8775.002.patch

> TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File 
> modifications
> --
>
> Key: YARN-8775
> URL: https://issues.apache.org/jira/browse/YARN-8775
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 3.0.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-8775.001.patch, YARN-8775.002.patch
>
>
> The test can fail sometimes when file operations were done during the check 
> done by the thread in _LocalDirsHandlerService._
> {code:java}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99)
> Stderr
> 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  to fail.
> 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
>  to fail.
> 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] 
> nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - 
> Directory 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  error, Not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1,
>  removing from list of valid directories
> 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] 
> localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not 
> initialize log dir 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> java.io.FileNotFoundException: Destination exists and is not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515)
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269)
> at 
>

[jira] [Commented] (YARN-8775) TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File modifications

2018-09-18 Thread JIRA



[ 
https://issues.apache.org/jira/browse/YARN-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618803#comment-16618803
 ] 

Antal Bálint Steinbach commented on YARN-8775:
--

Hi [~snemeth] ,

Thanks for the comments. All issues fixed.

> TestDiskFailures.testLocalDirsFailures sometimes can fail on concurrent File 
> modifications
> --
>
> Key: YARN-8775
> URL: https://issues.apache.org/jira/browse/YARN-8775
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test, yarn
>Affects Versions: 3.0.0
>Reporter: Antal Bálint Steinbach
>Assignee: Antal Bálint Steinbach
>Priority: Major
> Attachments: YARN-8775.001.patch, YARN-8775.002.patch
>
>
> The test can fail sometimes when file operations were done during the check 
> done by the thread in _LocalDirsHandlerService._
> {code:java}
> java.lang.AssertionError: NodeManager could not identify disk failure.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.verifyDisksHealth(TestDiskFailures.java:239)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testDirsFailures(TestDiskFailures.java:202)
>   at 
> org.apache.hadoop.yarn.server.TestDiskFailures.testLocalDirsFailures(TestDiskFailures.java:99)
> Stderr
> 2018-09-13 08:21:49,822 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  to fail.
> 2018-09-13 08:21:49,823 INFO [main] server.TestDiskFailures 
> (TestDiskFailures.java:prepareDirToFail(277)) - Prepared 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
>  to fail.
> 2018-09-13 08:21:49,823 WARN [DiskHealthMonitor-Timer] 
> nodemanager.DirectoryCollection (DirectoryCollection.java:checkDirs(283)) - 
> Directory 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1
>  error, Not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_1,
>  removing from list of valid directories
> 2018-09-13 08:21:49,824 WARN [DiskHealthMonitor-Timer] 
> localizer.ResourceLocalizationService 
> (ResourceLocalizationService.java:initializeLogDir(1329)) - Could not 
> initialize log dir 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> java.io.FileNotFoundException: Destination exists and is not a directory: 
> /tmp/dist-test-taskjUrf0_/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/target/org.apache.hadoop.yarn.server.TestDiskFailures/org.apache.hadoop.yarn.server.TestDiskFailures-logDir-nm-0_3
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:515)
> at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:496)
> at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1081)
> at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:178)
> at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:205)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:747)
> at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:743)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:743)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDir(ResourceLocalizationService.java:1324)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.initializeLogDirs(ResourceLocalizationService.java:1318)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$000(ResourceLocalizationService.java:141)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$2.onDirsChanged(ResourceLocalizationService.java:269)
> at 
>

[jira] [Commented] (YARN-8653) Wrong display of resources when cluster resources are less than min resources

2018-09-18 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618786#comment-16618786
 ] 

Weiwei Yang commented on YARN-8653:
---

Hi [~lingjinjiang]

I am not familiar with fair scheduler code, ping [~haibochen], he should be 
able to help to review this.

Thanks

> Wrong display of resources when cluster resources are less than min resources
> -
>
> Key: YARN-8653
> URL: https://issues.apache.org/jira/browse/YARN-8653
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Jinjiang Ling
>Assignee: Jinjiang Ling
>Priority: Major
> Attachments: YARN-8653.001.patch, YARN-8653.001.patch, 
> wrong_resource_in_fairscheduler.JPG
>
>
> If the cluster resources are less the min resources of Fair Scheduler, a 
> display error will happened like this.
>  
> !wrong_resource_in_fairscheduler.JPG!
> In this case, I config my queue with max resource to 48 vcores, 49152 MB and 
> min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 
> vcores and 24576 MB. Then the max resource are fixed to the cluster 
> resources, but the min resources and steady fair share are still the config 
> value.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints documentation

2018-09-18 Thread Masahiro Tanaka (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618781#comment-16618781
 ] 

Masahiro Tanaka commented on YARN-8787:
---

Thank you for re-uploading and reviewing [~ajisakaa] ! 
Thank you for committing [~cheersyang] !

> Fix broken list items in PlacementConstraints documentation
> ---
>
> Key: YARN-8787
> URL: https://issues.apache.org/jira/browse/YARN-8787
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, 
> listitems1.PNG
>
>
> It looks like some parts of the document below should be list items.
> https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
> It might be because of missing newlines before listing.
> https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8653) Wrong display of resources when cluster resources are less than min resources

2018-09-18 Thread Jinjiang Ling (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618768#comment-16618768
 ] 

Jinjiang Ling commented on YARN-8653:
-

Hi [~cheersyang], could you help to review this ?

> Wrong display of resources when cluster resources are less than min resources
> -
>
> Key: YARN-8653
> URL: https://issues.apache.org/jira/browse/YARN-8653
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Jinjiang Ling
>Assignee: Jinjiang Ling
>Priority: Major
> Attachments: YARN-8653.001.patch, YARN-8653.001.patch, 
> wrong_resource_in_fairscheduler.JPG
>
>
> If the cluster resources are less the min resources of Fair Scheduler, a 
> display error will happened like this.
>  
> !wrong_resource_in_fairscheduler.JPG!
> In this case, I config my queue with max resource to 48 vcores, 49152 MB and 
> min resources to 36 vcores, 36864 MB. But the cluster resources are only 24 
> vcores and 24576 MB. Then the max resource are fixed to the cluster 
> resources, but the min resources and steady fair share are still the config 
> value.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Updated] (YARN-8787) Fix broken list items in PlacementConstraints documentation

2018-09-18 Thread Weiwei Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8787:
--
Summary: Fix broken list items in PlacementConstraints documentation  (was: 
Fix broken list items in PlacementConstraints.md.vm)

> Fix broken list items in PlacementConstraints documentation
> ---
>
> Key: YARN-8787
> URL: https://issues.apache.org/jira/browse/YARN-8787
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, 
> listitems1.PNG
>
>
> It looks like some parts of the document below should be list items.
> https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
> It might be because of missing newlines before listing.
> https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints.md.vm

2018-09-18 Thread Weiwei Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618746#comment-16618746
 ] 

Weiwei Yang commented on YARN-8787:
---

+1 as well, thanks [~masatana], [~ajisakaa], I'll help to commit this shortly.

> Fix broken list items in PlacementConstraints.md.vm
> ---
>
> Key: YARN-8787
> URL: https://issues.apache.org/jira/browse/YARN-8787
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, 
> listitems1.PNG
>
>
> It looks like some parts of the document below should be list items.
> https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
> It might be because of missing newlines before listing.
> https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints.md.vm

2018-09-18 Thread Akira Ajisaka (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618720#comment-16618720
 ] 

Akira Ajisaka commented on YARN-8787:
-

+1, thanks [~masatana].

> Fix broken list items in PlacementConstraints.md.vm
> ---
>
> Key: YARN-8787
> URL: https://issues.apache.org/jira/browse/YARN-8787
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, 
> listitems1.PNG
>
>
> It looks like some parts of the document below should be list items.
> https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
> It might be because of missing newlines before listing.
> https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8787) Fix broken list items in PlacementConstraints.md.vm

2018-09-18 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618633#comment-16618633
 ] 

Hadoop QA commented on YARN-8787:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
28m 20s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 59s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 40m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 |
| JIRA Issue | YARN-8787 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12940144/YARN-8787.0.patch |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux b512f64ce90c 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 
07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / bbeca01 |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 440 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/21860/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Fix broken list items in PlacementConstraints.md.vm
> ---
>
> Key: YARN-8787
> URL: https://issues.apache.org/jira/browse/YARN-8787
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.1.1
>Reporter: Masahiro Tanaka
>Assignee: Masahiro Tanaka
>Priority: Minor
> Attachments: YARN-8787.0.patch, YARN-8787.0.patch, listitems0.PNG, 
> listitems1.PNG
>
>
> It looks like some parts of the document below should be list items.
> https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
> It might be because of missing newlines before listing.
> https://github.com/apache/hadoop/blob/ee051ef9fec1fddb612aa1feae9fd3df7091354f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PlacementConstraints.md.vm#L89-L92



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8652) [UI2] YARN UI2 breaks if getUserInfo REST API is not available in older versions.

2018-09-18 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618625#comment-16618625
 ] 

Hudson commented on YARN-8652:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14989 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14989/])
YARN-8652. [UI2] YARN UI2 breaks if getUserInfo REST API is not (sunilg: rev 
bbeca0107e247ae14cfe96761f9e5fbb1f02e53d)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/routes/application.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/application.js


> [UI2] YARN UI2 breaks if getUserInfo REST API is not available in older 
> versions.
> -
>
> Key: YARN-8652
> URL: https://issues.apache.org/jira/browse/YARN-8652
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8652.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8726) [UI2] YARN UI2 is not accessible when config.env file failed to load

2018-09-18 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618600#comment-16618600
 ] 

Hudson commented on YARN-8726:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14988 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14988/])
YARN-8726. [UI2] YARN UI2 is not accessible when config.env file failed 
(sunilg: rev 0cc6e039454127984a3aa5b2ba5d9151e4a72dd4)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/initializers/loader.js


> [UI2] YARN UI2 is not accessible when config.env file failed to load
> 
>
> Key: YARN-8726
> URL: https://issues.apache.org/jira/browse/YARN-8726
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Akhil PB
>Assignee: Akhil PB
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8726.001.patch
>
>
> It is observed that yarn UI2 is not accessible. When UI2 is inspected, it 
> gives below error
> {code:java}
> index.html:1 Refused to execute script from 
> 'http://ctr-e138-1518143905142-456429-01-05.hwx.site:8088/ui2/config/configs.env'
>  because its MIME type ('text/plain') is not executable, and strict MIME type 
> checking is enabled.
> yarn-ui.js:219 base url:
> vendor.js:1978 ReferenceError: ENV is not defined
>  at updateConfigs (yarn-ui.js:212)
>  at Object.initialize (yarn-ui.js:218)
>  at vendor.js:824
>  at vendor.js:825
>  at visit (vendor.js:3025)
>  at Object.visit [as default] (vendor.js:3024)
>  at DAG.topsort (vendor.js:750)
>  at Class._runInitializer (vendor.js:825)
>  at Class.runInitializers (vendor.js:824)
>  at Class._bootSync (vendor.js:823)
> onerrorDefault @ vendor.js:1978
> trigger @ vendor.js:2967
> (anonymous) @ vendor.js:3006
> invoke @ vendor.js:626
> flush @ vendor.js:629
> flush @ vendor.js:619
> end @ vendor.js:642
> run @ vendor.js:648
> join @ vendor.js:648
> run.join @ vendor.js:1510
> (anonymous) @ vendor.js:1512
> fire @ vendor.js:230
> fireWith @ vendor.js:235
> ready @ vendor.js:242
> completed @ vendor.js:242
> vendor.js:823 Uncaught ReferenceError: ENV is not defined
>  at updateConfigs (yarn-ui.js:212)
>  at Object.initialize (yarn-ui.js:218)
>  at vendor.js:824
>  at vendor.js:825
>  at visit (vendor.js:3025)
>  at Object.visit [as default] (vendor.js:3024)
>  at DAG.topsort (vendor.js:750)
>  at Class._runInitializer (vendor.js:825)
>  at Class.runInitializers (vendor.js:824)
>  at Class._bootSync (vendor.js:823)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8753) [UI2] Lost nodes representation missing from Nodemanagers Chart

2018-09-18 Thread Sunil Govindan (JIRA)



[ 
https://issues.apache.org/jira/browse/YARN-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618592#comment-16618592
 ] 

Sunil Govindan commented on YARN-8753:
--

This needs a rebase to trunk. [~yeshavora] pls help

> [UI2] Lost nodes representation missing from Nodemanagers Chart
> ---
>
> Key: YARN-8753
> URL: https://issues.apache.org/jira/browse/YARN-8753
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: Screen Shot 2018-09-06 at 6.16.02 PM.png, Screen Shot 
> 2018-09-06 at 6.16.14 PM.png, Screen Shot 2018-09-07 at 11.59.02 AM.png, 
> YARN-8753.001.patch
>
>
> Nodemanagers Chart is present in Cluster overview and Nodes->Nodes Status 
> page. 
> This chart does not show nodemanagers if they are LOST. 
> Due to this issue, Node information page and Node status page shows different 
> node managers count. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

1 2 >

1 - 100 of 114 matches

Mail list logo