[ 
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649299#comment-15649299
 ] 

Daniel Templeton commented on YARN-5600:
----------------------------------------

New comments:

* Maybe define a constant for {{new Date(Long.MAX_VALUE)}} is make it a little 
more obvious.
* In {{DeletionService.deleteWithDelay()}}, if the delay is -1, you're doing a 
bunch of work for no reason.
* The javadoc description for the {{scheduleFileDeletionTask()}} methods should 
end with a period.
* Is throwing the {{IllegalArgumentException}} the right thing to do? That 
information is not going to make it back to the end user who set the bogus 
value.  I didn't verify what happens when the dispatcher gets the exception, 
but it may take the NM down.
* Here: {code}      debugDelayDefault = conf.getInt(
              YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 0);{code} it would 
be better to make the 0 a constant.
* In {{ResourceLocalizationService.handleDestroyApplicationResources()}} you 
have {code}        debugDelaySec =
                (int)(((applicationCleanupTime.getTime() - now.getTime()) +
                999) / 1000);{code}  I'd rather see you divide by 1000 and add 
1.  It's a little less clever/more obvious.
* The javadoc description for {{ApplicationImpl.delayedDeletionTime}} should 
end with a period.
* {{estimateRetention()}} should maybe be {{calculateRetention()}} or 
{{updateRetention()}} or {{recalculateRetention()}}.  There's no estimation 
happening.
* {{TestDeletionService.testCustomDisableDelete()}} has a spurious "remain" in 
the javadoc
* In {{TestDeletionService.testCustomDisableDelete()}}, I'm concerned about the 
change from a 20 sec wait to a 1 sec wait.  Are you sure that in all cases, 
even on slow ancient upstream infrastructure, that 1 sec is enough?  Can you 
sensibly expose some metrics from the {{DeletionService}} so you can tell when 
a delete is ignored, rather than waiting around to see if the file is still 
there?
* You have a {{println()}} in {{testCustomRetentionPolicy()}}.
* I'm still worried about flakiness with the sleeps at the end of 
{{testCustomRetentionPolicy()}}.  Can you sensibly expose some metrics from the 
{{DeletionService}} so you can be more assertive about what happened, rather 
than relying on timing across threads?
* You have a {{println()}} in {{testAPIError()}}.
* {{testCustomRetentionPolicy()}} and {{testAPIError()}} are almost identical.  
Maybe add a parameter and reuse the code?
* In {{TestContainerManager}}, {{verifyContainerDir()}} and {{verifyAppDir()}} 
have a lot of common code.  Maybe pull it out into another method?
* In {{waitForApplicationDirDeleted()}} you might be better off with monotonic 
time.


> Add a parameter to ContainerLaunchContext to emulate 
> yarn.nodemanager.delete.debug-delay-sec on a per-application basis
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-5600
>                 URL: https://issues.apache.org/jira/browse/YARN-5600
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Daniel Templeton
>            Assignee: Miklos Szegedi
>              Labels: oct16-medium
>         Attachments: YARN-5600.000.patch, YARN-5600.001.patch, 
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a 
> parameter to the CLC to allow an application owner to request delayed 
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where 
> ATS is not in use, e.g. branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to