[ 
https://issues.apache.org/jira/browse/YARN-9138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775585#comment-16775585
 ] 

Szilard Nemeth commented on YARN-9138:
--------------------------------------

Hi [~adam.antal]!

Thanks for your comments, they are very detailed and valuable.

1. Good point, extracted most of the repetitive stuff into methods.

2. As GpuDiscoverer finds out where nvidia-smi lives based on the path provided 
in the config, I wanted to keep the behaviour in 
tests as close as possible to the production code. As the script is invoked by 
a call to Shell.execCommand(), we can count this as a hard-dependency of this 
class and it's kinda hard to mock this and if I done that, it would change 
GpuDiscoverer in a more fundamental way. To be precise, the bash script I 
"generate" in the test is not creating any new files, just echoing the contents 
of a very basic XML. I would like to keep this as it is. The only change I made 
with my new patch regarding this is the extraction of common things into 
methods.

3. Logging is not a common thing in tests as far my experience tells. I'm not 
saying that it's good or bad, at least that's what I have been seeing. Anyways, 
I added some logging instead of the comments in 
testGetGpuDeviceInformationFaultyNvidiaSmiScriptConsecutiveRun. If you have 
ideas on how to have better logs in this test class, feel free to report a new 
jira under YARN-9304.

About the less concerning things: 
1. It was a great idea to extract the parent directory name to a constant so I 
did that!
2. I gues "RunLinuxGpuResourceDiscoverPluginConfigTest" is set by either the 
user running the JVM (with a system property) or by some jenkins job. Probably 
[~sunilg] can tell you more on that as I didn't modify the code and he was the 
committer of this back in end of 2017.
3. Separation of testLinuxGpuResourceDiscoverPluginConfig: I agree, but I would 
create a follow-up jira for that. The purpose of my change was not to refactor 
but rather extend the test coverage.
4. I didn't get your comment about the separation of 
"getNumberOfUsableGpusFromConfig".

Please check my latest patch!

> Test error handling of nvidia-smi binary execution of GpuDiscoverer
> -------------------------------------------------------------------
>
>                 Key: YARN-9138
>                 URL: https://issues.apache.org/jira/browse/YARN-9138
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-9138.001.patch, YARN-9138.002.patch, 
> YARN-9138.003.patch
>
>
> The code that executes nvidia-smi (doing GPU device auto-discovery) don't 
> have much test coverage.
> This patch adds tests to this part of the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to