[
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15730449#comment-15730449
]
Miklos Szegedi commented on YARN-5600:
--------------------------------------
Thank you, [~templedf]! I addressed most issues. Here are my replies on the
others.
{quote}
In serviceInit(), you're missing the space before the brace on both your if
statements. Also, do you think you can find a sane way to combine them or at
least reuse the error message?
{quote}
I think it might even be more readable, when it is expanded like this.
{quote}
I'm still hung up on +999 / 1000 in ResourceLocalizationService. How about
Math.ceil(... / 1000.0)?
{quote}
I think we are talking about personal preferences at this point. I think that
converting to float and back is a bit overkill. This pattern is a known one for
rounding an integer up to the nearest multiple of a number without casting to
float.
{quote}
In TestContainerManager, I'm worried that the new tests are pretty heavily
timing dependent. At the very least, the second wait should be for the full
timeout period. Even better would be to validate the internal state is as
expected, e.g. that the deletion thread is set to execute after the expected
amount of time.
{quote}
The two waits add up to the total wait time. This is how it is designed to be
consistent. Waiting half more is unnecessary, since
{code}waitForApplicationDirDeleted{code} does the same. It verifies the
internal state, whether the file was deleted. I see this pattern in other
places in the tests as well. However, to make it more robust, I increased this
timeout. The third test, {code}testDelayedKeep(){code} may cause false
positives in case of a busy server, but any errors will be caught by "normal"
test machines.
{quote}
In TestDeletionService.testCustomDisableDelete(), do you need to set
DEBUG_NM_MAX_PER_APPLICATION_DELETE_DELAY_SEC? Also, you're missing an assert
message in that test method further down.
{quote}
Are we looking at the same patch? I set it here:
{code}
204 conf.setInt(YarnConfiguration.DEBUG_NM_DELETE_DELAY_SEC,
almostForever);
205
conf.setInt(YarnConfiguration.DEBUG_NM_MAX_PER_APPLICATION_DELETE_DELAY_SEC,
206 Integer.MAX_VALUE);
{code}
{quote}
testCustomRetentionPolicy() should probably also test what happens if no
container override is given, if the container override is 0, and if it's 2.
{quote}
testEffectiveDelay() is testing these cases already.
What do you think?
> Add a parameter to ContainerLaunchContext to emulate
> yarn.nodemanager.delete.debug-delay-sec on a per-application basis
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-5600
> URL: https://issues.apache.org/jira/browse/YARN-5600
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Daniel Templeton
> Assignee: Miklos Szegedi
> Labels: oct16-medium
> Attachments: YARN-5600.000.patch, YARN-5600.001.patch,
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch,
> YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch,
> YARN-5600.008.patch, YARN-5600.009.patch, YARN-5600.010.patch,
> YARN-5600.011.patch, YARN-5600.012.patch, YARN-5600.013.patch,
> YARN-5600.014.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a
> parameter to the CLC to allow an application owner to request delayed
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where
> ATS is not in use, e.g. branch-2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]