[jira] [Updated] (YARN-10501) Can't remove all node labels after add node label without nodemanager port
[ https://issues.apache.org/jira/browse/YARN-10501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caozhiqiang updated YARN-10501: --- Component/s: (was: resourcemanager) yarn > Can't remove all node labels after add node label without nodemanager port > -- > > Key: YARN-10501 > URL: https://issues.apache.org/jira/browse/YARN-10501 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Critical > Attachments: YARN-10501.002.patch, YARN-10501.003.patch > > > When add a label to nodes without nodemanager port or use WILDCARD_PORT (0) > port, it can't remove all label info in these nodes > Reproduce process: > {code:java} > 1.yarn rmadmin -addToClusterNodeLabels "cpunode(exclusive=true)" > 2.yarn rmadmin -replaceLabelsOnNode "server001=cpunode" > 3.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":["server001:0","server001:45454"],"partitionInfo":{"resourceAvailable":{"memory":"510","vCores":"1","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"510"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"1"}]}}} > 4.yarn rmadmin -replaceLabelsOnNode "server001" > 5.curl http://RM_IP:8088/ws/v1/cluster/label-mappings > {"labelsToNodes":{"entry":{"key":{"name":"cpunode","exclusivity":"true"},"value":{"nodes":"server001:45454","partitionInfo":{"resourceAvailable":{"memory":"0","vCores":"0","resourceInformations":{"resourceInformation":[{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"memory-mb","resourceType":"COUNTABLE","units":"Mi","value":"0"},{"attributes":null,"maximumAllocation":"9223372036854775807","minimumAllocation":"0","name":"vcores","resourceType":"COUNTABLE","units":"","value":"0"}]}}} > {code} > You can see after the 4 process to remove nodemanager labels, the label info > is still in the node info. > {code:java} > 641 case REPLACE: > 642 replaceNodeForLabels(nodeId, host.labels, labels); > 643 replaceLabelsForNode(nodeId, host.labels, labels); > 644 host.labels.clear(); > 645 host.labels.addAll(labels); > 646 for (Node node : host.nms.values()) { > 647 replaceNodeForLabels(node.nodeId, node.labels, labels); > 649 node.labels = null; > 650 } > 651 break;{code} > The cause is in 647 line, when add labels to node without port, the 0 port > and the real nm port with be both add to node info, and when remove labels, > the parameter node.labels in 647 line is null, so it will not remove the old > label. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9833) Race condition when DirectoryCollection.checkDirs() runs during container launch
[ https://issues.apache.org/jira/browse/YARN-9833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248343#comment-17248343 ] Peter Bacsko commented on YARN-9833: [~ebadger] [~Jim_Brennan] thanks for sharing some thougts on this. 1. We were not thinking about errorDirs because as we were tracking down the issue, only {{localDirs}} seemed to be problematic, although I agree that it is inconsistent this way. Shall we follow-up on this? 2. What Jim said is interesting. Does it mean that we potentially introduced a new bug by fixing this? That would be really bad. If this is really an issue, perhaps we can also follow-up on this, too, by creating a new JIRA to examine call hierarchies. > Race condition when DirectoryCollection.checkDirs() runs during container > launch > > > Key: YARN-9833 > URL: https://issues.apache.org/jira/browse/YARN-9833 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Fix For: 3.3.0, 3.2.2, 3.1.4 > > Attachments: YARN-9833-001.patch > > > During endurance testing, we found a race condition that cause an empty > {{localDirs}} being passed to container-executor. > The problem is that {{DirectoryCollection.checkDirs()}} clears three > collections: > {code:java} > this.writeLock.lock(); > try { > localDirs.clear(); > errorDirs.clear(); > fullDirs.clear(); > ... > {code} > This happens in critical section guarded by a write lock. When we start a > container, we retrieve the local dirs by calling > {{dirsHandler.getLocalDirs();}} which in turn invokes > {{DirectoryCollection.getGoodDirs()}}. The implementation of this method is: > {code:java} > List getGoodDirs() { > this.readLock.lock(); > try { > return Collections.unmodifiableList(localDirs); > } finally { > this.readLock.unlock(); > } > } > {code} > So we're also in a critical section guarded by the lock. But > {{Collections.unmodifiableList()}} only returns a _view_ of the collection, > not a copy. After we get the view, {{MonitoringTimerTask.run()}} might be > scheduled to run and immediately clears {{localDirs}}. > This caused a weird behaviour in container-executor, which exited with error > code 35 (COULD_NOT_CREATE_WORK_DIRECTORIES). > Therefore we can't just return a view, we must return a copy with > {{ImmutableList.copyOf()}}. > Credits to [~snemeth] for analyzing and determining the root cause. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10531) Be able to disable user limit factor for CapacityScheduler Leaf Queue
[ https://issues.apache.org/jira/browse/YARN-10531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248318#comment-17248318 ] Hadoop QA commented on YARN-10531: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 25m 43s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 12s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 45s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 1m 45s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 43s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/383/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 13 new + 486 unchanged - 0 fixed = 499 total (was 486) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 43s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | |
[jira] [Commented] (YARN-10031) Create a general purpose log request with additional query parameters
[ https://issues.apache.org/jira/browse/YARN-10031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248315#comment-17248315 ] Adam Antal commented on YARN-10031: --- Committed to trunk. Thanks for the contribution [~gandras]! > Create a general purpose log request with additional query parameters > - > > Key: YARN-10031 > URL: https://issues.apache.org/jira/browse/YARN-10031 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Adam Antal >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10031-WIP.001.patch, YARN-10031.001.patch, > YARN-10031.002.patch, YARN-10031.003.patch, YARN-10031.004.patch, > YARN-10031.005.patch, YARN-10031.005.patch, YARN-10031.006.patch > > > The current endpoints are robust but not very flexible with regards to > filtering options. I suggest to add an endpoint which provides filtering > options. > E.g.: > In ATS we have multiple endpoints: > /containers/{containerid}/logs/{filename} > /containerlogs/{containerid}/{filename} > We could add @QueryParams parameters to the REST endpoints like this: > /containers/{containerid}/logs?fileName=stderr=FAILED=nm45 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org