[
https://issues.apache.org/jira/browse/YARN-9139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760748#comment-16760748
]
Szilard Nemeth commented on YARN-9139:
--------------------------------------
I changed the validation of the existence of the GPU discovery binary in the
following way:
With the old code, GpuDiscoverer.initialize() did not throw exception if the
binary was not found but the exception was thrown later when
GpuDiscoverer.getGpusUsableByYarn gets called.
As most of the tests in TestGpuResourceHandler was relying on the fact that an
exception is only thrown later from GpuDiscoverer.getGpusUsableByYarn, patch002
introduced test failures for almost all the testcases since the testcases only
called initialize and the exception is now thrown in an earlier state (in a
fail-fast way).
As the binaryPath would be having the value of
"/usr/local/nvidia/bin/nvidia-smi" if the Configuration object has no explicit
setting for the path, I required to modify all the tests and provide the path
explicitly with the Configuration object, so the tests are independent of the
runtime environment and most likely no Jenkins nor most of the development
environments have nvidia-smi set up under the default path.
Patch003 fixes these issues.
> Simplify initializer code of GpuDiscoverer
> ------------------------------------------
>
> Key: YARN-9139
> URL: https://issues.apache.org/jira/browse/YARN-9139
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: YARN-9139.001.patch, YARN-9139.002.patch
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]