[
https://issues.apache.org/jira/browse/YARN-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775585#comment-16775585
]
Szilard Nemeth commented on YARN-9138:
--------------------------------------
Hi [~adam.antal]!
Thanks for your comments, they are very detailed and valuable.
1. Good point, extracted most of the repetitive stuff into methods.
2. As GpuDiscoverer finds out where nvidia-smi lives based on the path provided
in the config, I wanted to keep the behaviour in
tests as close as possible to the production code. As the script is invoked by
a call to Shell.execCommand(), we can count this as a hard-dependency of this
class and it's kinda hard to mock this and if I done that, it would change
GpuDiscoverer in a more fundamental way. To be precise, the bash script I
"generate" in the test is not creating any new files, just echoing the contents
of a very basic XML. I would like to keep this as it is. The only change I made
with my new patch regarding this is the extraction of common things into
methods.
3. Logging is not a common thing in tests as far my experience tells. I'm not
saying that it's good or bad, at least that's what I have been seeing. Anyways,
I added some logging instead of the comments in
testGetGpuDeviceInformationFaultyNvidiaSmiScriptConsecutiveRun. If you have
ideas on how to have better logs in this test class, feel free to report a new
jira under YARN-9304.
About the less concerning things:
1. It was a great idea to extract the parent directory name to a constant so I
did that!
2. I gues "RunLinuxGpuResourceDiscoverPluginConfigTest" is set by either the
user running the JVM (with a system property) or by some jenkins job. Probably
[~sunilg] can tell you more on that as I didn't modify the code and he was the
committer of this back in end of 2017.
3. Separation of testLinuxGpuResourceDiscoverPluginConfig: I agree, but I would
create a follow-up jira for that. The purpose of my change was not to refactor
but rather extend the test coverage.
4. I didn't get your comment about the separation of
"getNumberOfUsableGpusFromConfig".
Please check my latest patch!
> Test error handling of nvidia-smi binary execution of GpuDiscoverer
> -------------------------------------------------------------------
>
> Key: YARN-9138
> URL: https://issues.apache.org/jira/browse/YARN-9138
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: YARN-9138.001.patch, YARN-9138.002.patch,
> YARN-9138.003.patch
>
>
> The code that executes nvidia-smi (doing GPU device auto-discovery) don't
> have much test coverage.
> This patch adds tests to this part of the code.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]