[
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748059#comment-16748059
]
Peter Bacsko commented on YARN-9100:
------------------------------------
Thanks for the improvements Szilard. Some thoughts:
1.
{noformat}
87 } catch (InterruptedException e) {
88 // On any interrupt, break the loop and continue execution.
89 break;{noformat}
At least log something in case of an {{InterruptedException.}} Also, in cases
like this, restoring the interrupted flag with
{{Thread.currentThread.interrupt()}} is desirable.
2. In {{logStatement()}} you log twice if TRACE is enabled (I guess?)
3. Use SLF4J as an API, not Commons Logging.
4. You don't log anything in case of a timeout.
5. You can define both {{check}} and {{nonNullCheck}} at the same time. There
are two problems with this. First, ordinary {{check}} is not used in the code.
Second, if both are used, then the result of {{nonNullCheck}} is simply ignored.
In general I feel that having the retry logic in a separate class a bit of an
overengineering. It would be justified it the patch modified classes other than
{{GpuResourceAllocator}}. But for only a single class, it looks like an
overkill.
Also, check out this project, which might be good for us:
https://github.com/rholder/guava-retrying
> Add tests for GpuResourceAllocator and do minor code cleanup
> ------------------------------------------------------------
>
> Key: YARN-9100
> URL: https://issues.apache.org/jira/browse/YARN-9100
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: YARN-9100.001.patch, YARN-9100.002.patch,
> YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal
> data structures. The word "copy" is just a noise in their name, so they have
> been renamed. Additionally, the copied data structures modified to be
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class,
> RetryCommand.
> Some more words about the new class RetryCommand:
> There are some similar waiting loops in the code in: AMRMClient,
> AMRMClientAsync and even in GenericTestUtils (see waitFor method).
> RetryCommand could be a future replacement of these duplicated code, as it
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's
> solely because how Java deals with checked exceptions vs. lambdas. If there's
> a cleaner way to solve the exception handling, I'm open for any suggestions.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]