[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port

2020-12-12 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated YARN-10501:
---
Component/s: (was: resourcemanager)
 yarn

> Can't remove all node labels after add node label without nodemanager port
> --
>
> Key: YARN-10501
> URL: https://issues.apache.org/jira/browse/YARN-10501
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Critical
> Attachments: YARN-10501.002.patch, YARN-10501.003.patch
>
>
> When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) 
> port, it can't remove all label info in these nodes
> Reproduce process:
> {code:java}
> 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)"
> 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode"
> 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}}
> 4.yarn rmadmin -replaceLabelsOnNode "server001"
> 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings
> {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}}
>  {code}
> You can see after the 4 process to remove nodemanager labels, the label info 
> is still in the node info.
> {code:java}
>  641 case REPLACE:
>  642 replaceNodeForLabels(nodeId, host.labels, labels);
>  643 replaceLabelsForNode(nodeId, host.labels, labels);
>  644 host.labels.clear();
>  645 host.labels.addAll(labels);
>  646 for (Node node : host.nms.values()) {
>  647 replaceNodeForLabels(node.nodeId, node.labels, labels);
>  649 node.labels = null;
>  650 }
>  651 break;{code}
> The cause is in 647 line, when add labels to node without port, the 0 port 
> and the real nm port with be both add to node info, and when remove labels, 
> the parameter node.labels in 647 line is null, so it will not remove the old 
> label. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch

2020-12-12 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248343#comment-17248343
 ] 

Peter Bacsko commented on YARN-9833:


[~ebadger] [~Jim_Brennan] thanks for sharing some thougts on this.

1. We were not thinking about errorDirs because as we were tracking down the 
issue, only {{localDirs}} seemed to be problematic, although I agree that it is 
inconsistent this way. Shall we follow-up on this?

2. What Jim said is interesting. Does it mean that we potentially introduced a 
new bug by fixing this? That would be really bad. If this is really an issue, 
perhaps we can also follow-up on this, too, by creating a new JIRA to examine 
call hierarchies.

> Race condition when DirectoryCollection.checkDirs() runs during container 
> launch
> 
>
> Key: YARN-9833
> URL: https://issues.apache.org/jira/browse/YARN-9833
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9833-001.patch
>
>
> During endurance testing, we found a race condition that cause an empty 
> {{localDirs}} being passed to container-executor.
> The problem is that {{DirectoryCollection.checkDirs()}} clears three 
> collections:
> {code:java}
> this.writeLock.lock();
> try {
>   localDirs.clear();
>   errorDirs.clear();
>   fullDirs.clear();
>   ...
> {code}
> This happens in critical section guarded by a write lock. When we start a 
> container, we retrieve the local dirs by calling 
> {{dirsHandler.getLocalDirs();}} which in turn invokes 
> {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is:
> {code:java}
> List getGoodDirs() {
> this.readLock.lock();
> try {
>   return Collections.unmodifiableList(localDirs);
> } finally {
>   this.readLock.unlock();
> }
>   }
> {code}
> So we're also in a critical section guarded by the lock. But 
> {{Collections.unmodifiableList()}} only returns a _view_ of the collection, 
> not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be 
> scheduled to run and immediately clears {{localDirs}}.
> This caused a weird behaviour in container-executor, which exited with error 
> code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES).
> Therefore we can't just return a view, we must return a copy with 
> {{ImmutableList.copyOf()}}.
> Credits to [~snemeth] for analyzing and determining the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10531) Be able to disable user limit factor for CapacityScheduler Leaf Queue

2020-12-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248318#comment-17248318
 ] 

Hadoop QA commented on YARN-10531:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 
43s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
12s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 45s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
45s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
53s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 43s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/383/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 13 new + 486 unchanged - 0 fixed = 499 total (was 486) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 43s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters

2020-12-12 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248315#comment-17248315
 ] 

Adam Antal commented on YARN-10031:
---

Committed to trunk. Thanks for the contribution [~gandras]!

> Create a general purpose log request with additional query parameters
> -
>
> Key: YARN-10031
> URL: https://issues.apache.org/jira/browse/YARN-10031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Adam Antal
>Assignee: Andras Gyori
>Priority: Major
> Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch, 
> YARN-10031.002.patch, YARN-10031.003.patch, YARN-10031.004.patch, 
> YARN-10031.005.patch, YARN-10031.005.patch, YARN-10031.006.patch
>
>
> The current endpoints are robust but not very flexible with regards to 
> filtering options. I suggest to add an endpoint which provides filtering 
> options.
> E.g.:
> In ATS we have multiple endpoints:
> /containers/{containerid}/logs/{filename}
> /containerlogs/{containerid}/{filename}
> We could add @QueryParams parameters to the REST endpoints like this:
> /containers/{containerid}/logs?fileName=stderr=FAILED=nm45



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org