[jira] [Commented] (YARN-9635) Nodes page displayed duplicate nodes
[ https://issues.apache.org/jira/browse/YARN-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889640#comment-16889640 ] Hadoop QA commented on YARN-9635: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 44s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 4s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 50s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 1s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 91m 12s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Commented] (YARN-9688) Variable description error of method in stateMachine class
[ https://issues.apache.org/jira/browse/YARN-9688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889637#comment-16889637 ] runzhou wu commented on YARN-9688: -- i want to fix this bug.thx > Variable description error of method in stateMachine class > --- > > Key: YARN-9688 > URL: https://issues.apache.org/jira/browse/YARN-9688 > Project: Hadoop YARN > Issue Type: Bug >Reporter: runzhou wu >Priority: Trivial > > StateMachineFactory class > /** > * Effect a transition due to the effecting stimulus. > * @param {color:#FF}state{color} current state > * @param eventType trigger to initiate the transition > * @param {color:#FF}cause{color} causal eventType context > * @return transitioned state > */ > private STATE doTransition > (OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event) > throws InvalidStateTransitionException { -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-9688) Variable description error of method in stateMachine class
runzhou wu created YARN-9688: Summary: Variable description error of method in stateMachine class Key: YARN-9688 URL: https://issues.apache.org/jira/browse/YARN-9688 Project: Hadoop YARN Issue Type: Bug Reporter: runzhou wu StateMachineFactory class /** * Effect a transition due to the effecting stimulus. * @param {color:#FF}state{color} current state * @param eventType trigger to initiate the transition * @param {color:#FF}cause{color} causal eventType context * @return transitioned state */ private STATE doTransition (OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event) throws InvalidStateTransitionException { -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9635) Nodes page displayed duplicate nodes
[ https://issues.apache.org/jira/browse/YARN-9635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wanqiang Ji updated YARN-9635: -- Attachment: YARN-9635.002.patch > Nodes page displayed duplicate nodes > > > Key: YARN-9635 > URL: https://issues.apache.org/jira/browse/YARN-9635 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager, resourcemanager >Affects Versions: 3.2.0 >Reporter: Wanqiang Ji >Assignee: Wanqiang Ji >Priority: Major > Attachments: UI2-nodes.jpg, YARN-9635.001.patch, YARN-9635.002.patch > > > Steps: > * shutdown nodes > * start nodes > Nodes Page: > !UI2-nodes.jpg! -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing
[ https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889525#comment-16889525 ] Hadoop QA commented on YARN-9217: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 7 new + 15 unchanged - 2 fixed = 22 total (was 17) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 7s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 54s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 73m 54s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | | Possible null pointer dereference in org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer.getFileNameFromFile(File) due to return value of called method Dereferenced at GpuDiscoverer.java:org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.GpuDiscoverer.getFileNameFromFile(File) due to return value of called method Dereferenced at GpuDiscoverer.java:[line 347] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=18.09.8 Server=18.09.8 Image:yetus/hadoop:bdbca0e53b4 | | JIRA Issue | YARN-9217 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12975315/YARN-9217.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 279c71782a1a 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64
[jira] [Commented] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing
[ https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889508#comment-16889508 ] Peter Bacsko commented on YARN-9217: [~snemeth] I rebased the patch but the amount of conflicts forced me to abandon some changes present in the original patches, especially in {{GpuDiscoverer}}. A lot of changes have been added in the meantime. v5 is still not the final patch, I expect checkstyle, findbugs or even javac errors to show up. > Nodemanager will fail to start if GPU is misconfigured on the node or GPU > drivers missing > - > > Key: YARN-9217 > URL: https://issues.apache.org/jira/browse/YARN-9217 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0, 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9217.001.patch, YARN-9217.002.patch, > YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch > > > Nodemanager will not start > 1. If Autodiscovery is enabled: > * If nvidia-smi path is misconfigured or the file does not exist. > * There is 0 GPU found > * If the file exists but it is not pointing to an nvidia-smi > * if the binary is ok but there is an IOException > 2. If the manually configured GPU devices are misconfigured > * Any index:minor number format failure will cause a problem > * 0 configured device will cause a problem > * NumberFormatException is not handled > It would be a better option to add warnings about the configuration, set 0 > available GPUs and let the node work and run non-gpu jobs. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-9217) Nodemanager will fail to start if GPU is misconfigured on the node or GPU drivers missing
[ https://issues.apache.org/jira/browse/YARN-9217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko updated YARN-9217: --- Attachment: YARN-9217.005.patch > Nodemanager will fail to start if GPU is misconfigured on the node or GPU > drivers missing > - > > Key: YARN-9217 > URL: https://issues.apache.org/jira/browse/YARN-9217 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0, 3.1.0 >Reporter: Antal Bálint Steinbach >Assignee: Peter Bacsko >Priority: Major > Attachments: YARN-9217.001.patch, YARN-9217.002.patch, > YARN-9217.003.patch, YARN-9217.004.patch, YARN-9217.005.patch > > > Nodemanager will not start > 1. If Autodiscovery is enabled: > * If nvidia-smi path is misconfigured or the file does not exist. > * There is 0 GPU found > * If the file exists but it is not pointing to an nvidia-smi > * if the binary is ok but there is an IOException > 2. If the manually configured GPU devices are misconfigured > * Any index:minor number format failure will cause a problem > * 0 configured device will cause a problem > * NumberFormatException is not handled > It would be a better option to add warnings about the configuration, set 0 > available GPUs and let the node work and run non-gpu jobs. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-9596) QueueMetrics has incorrect metrics when labelled partitions are involved
[ https://issues.apache.org/jira/browse/YARN-9596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889407#comment-16889407 ] Manikandan R commented on YARN-9596: IIUC, This case would arise in this example: Node N has been mapped to Label X (Non exclusive). Queue A has been configured with ANY Node label. App A requested resources from Queue A and its containers ran on Node N for some reasons. During {{AbstractCSQueue#allocateResource}} call, Node partition (using {{SchedulerNode}} ) would get used for calculation. As you explained, code will calculate {{available}} to {{}} and there is no use as anyways {{QueueMetrics#setAvailableResourcesToQueue}} process only "default" partition as of now. But, with YARN-6492 coming in, I think we will revisit this particular {{if}} check as we need {{available}} for all used partitions. I guess, it would be bit tricky to do. {{Queue#getNodeLabelsForQueue}} will have only the labels for which some min and max resource has been configured. Based on the above example, metrics computation should happen for partition X also properly. Can you please validate my understanding? > QueueMetrics has incorrect metrics when labelled partitions are involved > > > Key: YARN-9596 > URL: https://issues.apache.org/jira/browse/YARN-9596 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 2.8.0, 3.3.0 >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan >Priority: Major > Attachments: Screen Shot 2019-06-03 at 4.41.45 PM.png, Screen Shot > 2019-06-03 at 4.44.15 PM.png, YARN-9596.001.patch, YARN-9596.002.patch, > YARN-9596.003.patch > > > After YARN-6467, QueueMetrics should only be tracking metrics for the default > partition. However, the metrics are incorrect when labelled partitions are > involved. > Steps to reproduce > == > # Configure capacity-scheduler.xml with label configuration > # Add label "test" to cluster and replace label on node1 to be "test" > # Note down "totalMB" at > /ws/v1/cluster/metrics > # Start first job on test queue. > # Start second job on default queue (does not work if the order of two jobs > is swapped). > # While the two applications are running, the "totalMB" at > /ws/v1/cluster/metrics will go down by > the amount of MB used by the first job (screenshots attached). > Alternately: > In > TestNodeLabelContainerAllocation.testQueueMetricsWithLabelsOnDefaultLabelNode(), > add the following line at the end of the test before rm1.close(): > CSQueue rootQueue = cs.getRootQueue(); > assertEquals(10*GB, > rootQueue.getMetrics().getAvailableMB() + > rootQueue.getMetrics().getAllocatedMB()); > There are two nodes of 10GB each and only one of them have a non-default > label. The test will also fail against 20*GB check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org