[ 
https://issues.apache.org/jira/browse/YARN-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748059#comment-16748059
 ] 

Peter Bacsko commented on YARN-9100:
------------------------------------

Thanks for the improvements Szilard. Some thoughts:



1.
{noformat}
87      } catch (InterruptedException e) {
88      // On any interrupt, break the loop and continue execution.
89      break;{noformat}
At least log something in case of an {{InterruptedException.}} Also, in cases 
like this, restoring the interrupted flag with 
{{Thread.currentThread.interrupt()}} is desirable.

2. In {{logStatement()}} you log twice if TRACE is enabled (I guess?)

3. Use SLF4J as an API, not Commons Logging.

4. You don't log anything in case of a timeout.

5. You can define both {{check}} and {{nonNullCheck}} at the same time. There 
are two problems with this. First, ordinary {{check}} is not used in the code. 
Second, if both are used, then the result of {{nonNullCheck}} is simply ignored.

 
In general I feel that having the retry logic in a separate class a bit of an 
overengineering. It would be justified it the patch modified classes other than 
{{GpuResourceAllocator}}. But for only a single class, it looks like an 
overkill.
Also, check out this project, which might be good for us: 
https://github.com/rholder/guava-retrying

 

> Add tests for GpuResourceAllocator and do minor code cleanup
> ------------------------------------------------------------
>
>                 Key: YARN-9100
>                 URL: https://issues.apache.org/jira/browse/YARN-9100
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: YARN-9100.001.patch, YARN-9100.002.patch, 
> YARN-9100.003.patch
>
>
> Add tests for GpuResourceAllocator and do minor code cleanup
> - Improved log and exception messages
> - Added some new debug logs
> - Some methods are named like *Copy, these are returning copies of internal 
> data structures. The word "copy" is just a noise in their name, so they have 
> been renamed. Additionally, the copied data structures modified to be 
> immutable.
> - The waiting loop in method assignGpus were decoupled into a new class, 
> RetryCommand. 
> Some more words about the new class RetryCommand: 
> There are some similar waiting loops in the code in: AMRMClient, 
> AMRMClientAsync and even in GenericTestUtils (see waitFor method). 
> RetryCommand could be a future replacement of these duplicated code, as it 
> gives a solution to this waiting loop problem in a generic way.
> The only downside of the usage of RetryCommand in GpuResourceAllocator 
> (startGpuAssignmentLoop) is the ugly exception handling part, but that's 
> solely because how Java deals with checked exceptions vs. lambdas. If there's 
> a cleaner way to solve the exception handling, I'm open for any suggestions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to