[
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752624#comment-15752624
]
Miklos Szegedi commented on YARN-5600:
--------------------------------------
Thank you, [~kasha] for the review!
In general about the design: delete.debug-delay-sec is there for backward
compatibility, I think. I do not expect many people to use it after this new
functionality, since it affects the whole cluster.
Because of this I would not mix it's value much with
delete.max-per-application-debug-delay-sec.
I suggest filing a separate jira to deprecate delete.debug-delay-sec for v3.
This would make the design of this feature simpler.
{quote}
ApplicationConstants: typo in the javadoc.
{quote}
Fixed.
{quote}
YarnConfiguration: max-per-application-debug-delay-sec
"per-application" is redundant and makes the config long. Can we just have
max-debug-delay-sec or d
debug-delay-sec.max.
{quote}
[~vvasudev], You suggested originally this value along with differentiating the
default and the maximum. I am okay with changing it back, what is your opinion?
{quote}
I don't see the need to have a separate constant for the corresponding default
value. Can we use the
value of delete.debug-delay-sec or a multiple of it (say, 2) as default instead?
{quote}
I do not think it is a good idea to set a default to a variable value. I could
not even write anything to yarn-default.xml in this case. Moreover the default
is 0 in 99%+ of the cases.
I would expect people will use the new feature more often in the future, since
it does not affect the whole cluster. delete.debug-delay-sec is there for
backward compatibility. Can we keep it separate than delete.debug-delay-sec?
{quote}
Also, looks like max-debug-delay-sec could be smaller than debug-delay-sec.
That is confusing. Can we make sure max is actually max?
{quote}
I have to defer this to [~vvasudev], since it was his suggestion originally.
{quote}
Move the javadoc from ApplicationImpl to Application. Also, can we linkify
Application.NEVER.
{quote}
I added it to both.
{quote}
If the admin sets debug-delay-sec to a positive value (say, 5 mins) and the
user explicitly sets it to a smaller value (even 0), shouldn't the user's
setting override for all containers of that app?
{quote}
The idea is that the standard value is 0 in 99%+ of cases. Sometimes the user
or the admin wants to delay that by specifying a delay value. Using the max
fulfills both requests, there is no priority, no override. I thought this is
the simplest what I can do to make both of them happy, if I need to specify
both. Limiting the delay would not make much sense, since any smaller number
what requested prevents them to collect the necessary debug information for the
other party. There is a feature to disable the delay functionality in sensitive
applications and that is the cluster wide max delay.
{quote}
It appears we allow setting this value on a per-container basis through
ContainerLaunchContext, but consider only the largest value. Is there value in
allowing specifying this on a per-container basis. Why not specify it on a
per-app basis? Say, through ApplicationSubmissionContext?
{quote}
We consider all values specified in each CLC. However, if an application has
multiple containers on the same node, we delete the application dir only after
the latest CLC is deleted. The largest logic just makes sure that the
application dir is not deleted before the containers in it are deleted. Is not
it more flexible to specify a container specific value?
{quote}
Any specific reason for capturing the deletion time in Date instead of long?
Isn't Date more expensive?
{quote}
Date is just a wrapper above a long value. I feel the code is more readable
with Date than it would be otherwise. You can for example get the real time
value in the debugger without any additional calculations.
{quote}
yarn-default: s/mumber/number
{quote}
Fixed.
{quote}
DeletionServe
scheduleFileDeletionTask takes a delay, and then adjusts it based on
configuration. These seems error-prone. Callers of this method would assume the
file will be deleted at the provided time. Can the caller take care of the
adjustment?
{quote}
It depends. If this is true, it was already error prone before my change, since
we requested a delete that happened later overridden by the default delay. All
I do is adding a custom delay. The contract is that the task will be delayed
for at least the amount of time specified. I was thinking about changing the
function name or creating a separate function that reflects this design better
that the caller would call. Honestly, scheduleFileDeletionTask should be
refactored anyways to be private, so I just changed the documentation for now.
What do you think? (This problem would also go away, if we deprecate default
delay.)
{quote}
getEffectiveDelay: if the limitedDelay > maxDebugDelay, one could set
limitedDelay to maxDebugDelay without computing Math.min.
{quote}
Fixed.
> Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-application
> basis
> --------------------------------------------------------------------------------
>
> Key: YARN-5600
> URL: https://issues.apache.org/jira/browse/YARN-5600
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Daniel Templeton
> Assignee: Miklos Szegedi
> Labels: oct16-medium
> Attachments: YARN-5600.000.patch, YARN-5600.001.patch,
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch,
> YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch,
> YARN-5600.008.patch, YARN-5600.009.patch, YARN-5600.010.patch,
> YARN-5600.011.patch, YARN-5600.012.patch, YARN-5600.013.patch,
> YARN-5600.014.patch, YARN-5600.015.patch, YARN-5600.016.patch,
> YARN-5600.017.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a
> parameter to the CLC to allow an application owner to request delayed
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where
> ATS is not in use, e.g. branch-2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]