[ 
https://issues.apache.org/jira/browse/YARN-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18012718#comment-18012718
 ] 

ASF GitHub Bot commented on YARN-11844:
---------------------------------------

hadoop-yetus commented on PR #7857:
URL: https://github.com/apache/hadoop/pull/7857#issuecomment-3166379902

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |   9m  7s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  37m 37s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m  7s |  |  trunk passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  checkstyle  |   2m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   5m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  42m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 59s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  4s |  |  the patch passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  javac  |   5m  4s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 55s | 
[/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7857/5/artifact/out/results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt)
 |  hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 164 unchanged 
- 0 fixed = 166 total (was 164)  |
   | +1 :green_heart: |  mvnsite  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09  |
   | +1 :green_heart: |  spotbugs  |   5m 47s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  42m 26s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  5s |  |  hadoop-yarn-api in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   5m 48s |  |  hadoop-yarn-common in the patch 
passed.  |
   | -1 :x: |  unit  |  27m 14s | 
[/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7857/5/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt)
 |  hadoop-yarn-server-nodemanager in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  0s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 226m 16s |  |  |
   
   
   | Reason | Tests |
   |-------:|:------|
   | Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.resourceplugin.gpu.TestGpuDiscoverer
 |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.51 ServerAPI=1.51 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7857/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/7857 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux b2f5bc390399 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 
07:33:10 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7608230d90d8ded10d5937ad9694c750460df070 |
   | Default Java | Private Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_452-8u452-ga~us1-0ubuntu1~20.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7857/5/testReport/ |
   | Max. process+thread count | 589 (vs. ulimit of 5500) |
   | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: hadoop-yarn-project/hadoop-yarn |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7857/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Support configuration of retry policy on GPU discovery
> ------------------------------------------------------
>
>                 Key: YARN-11844
>                 URL: https://issues.apache.org/jira/browse/YARN-11844
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: gpu, nodemanager
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>            Priority: Major
>              Labels: pull-request-available
>
> The NodeManager invokes an external binary (e.g. {{nvidia-smi}}) to discover 
> attached GPUs. Right now, there is a hard-coded 10-second timeout on 
> execution of this binary and a hard-coded max error count of 10, beyond which 
> the NodeManager will stop attempting discovery. This change will provide new 
> configuration properties to control both the timeout and the max errors, 
> which is useful in environments where there may be a delay in binding the GPU 
> to the host. Default values for the new configuration properties will be set 
> so as to maintain the existing behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to