[
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649299#comment-15649299
]
Daniel Templeton commented on YARN-5600:
----------------------------------------
New comments:
* Maybe define a constant for {{new Date(Long.MAX_VALUE)}} is make it a little
more obvious.
* In {{DeletionService.deleteWithDelay()}}, if the delay is -1, you're doing a
bunch of work for no reason.
* The javadoc description for the {{scheduleFileDeletionTask()}} methods should
end with a period.
* Is throwing the {{IllegalArgumentException}} the right thing to do? That
information is not going to make it back to the end user who set the bogus
value. I didn't verify what happens when the dispatcher gets the exception,
but it may take the NM down.
* Here: {code} debugDelayDefault = conf.getInt(
YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC, 0);{code} it would
be better to make the 0 a constant.
* In {{ResourceLocalizationService.handleDestroyApplicationResources()}} you
have {code} debugDelaySec =
(int)(((applicationCleanupTime.getTime() - now.getTime()) +
999) / 1000);{code} I'd rather see you divide by 1000 and add
1. It's a little less clever/more obvious.
* The javadoc description for {{ApplicationImpl.delayedDeletionTime}} should
end with a period.
* {{estimateRetention()}} should maybe be {{calculateRetention()}} or
{{updateRetention()}} or {{recalculateRetention()}}. There's no estimation
happening.
* {{TestDeletionService.testCustomDisableDelete()}} has a spurious "remain" in
the javadoc
* In {{TestDeletionService.testCustomDisableDelete()}}, I'm concerned about the
change from a 20 sec wait to a 1 sec wait. Are you sure that in all cases,
even on slow ancient upstream infrastructure, that 1 sec is enough? Can you
sensibly expose some metrics from the {{DeletionService}} so you can tell when
a delete is ignored, rather than waiting around to see if the file is still
there?
* You have a {{println()}} in {{testCustomRetentionPolicy()}}.
* I'm still worried about flakiness with the sleeps at the end of
{{testCustomRetentionPolicy()}}. Can you sensibly expose some metrics from the
{{DeletionService}} so you can be more assertive about what happened, rather
than relying on timing across threads?
* You have a {{println()}} in {{testAPIError()}}.
* {{testCustomRetentionPolicy()}} and {{testAPIError()}} are almost identical.
Maybe add a parameter and reuse the code?
* In {{TestContainerManager}}, {{verifyContainerDir()}} and {{verifyAppDir()}}
have a lot of common code. Maybe pull it out into another method?
* In {{waitForApplicationDirDeleted()}} you might be better off with monotonic
time.
> Add a parameter to ContainerLaunchContext to emulate
> yarn.nodemanager.delete.debug-delay-sec on a per-application basis
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-5600
> URL: https://issues.apache.org/jira/browse/YARN-5600
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Daniel Templeton
> Assignee: Miklos Szegedi
> Labels: oct16-medium
> Attachments: YARN-5600.000.patch, YARN-5600.001.patch,
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a
> parameter to the CLC to allow an application owner to request delayed
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where
> ATS is not in use, e.g. branch-2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]