[ 
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15752624#comment-15752624
 ] 

Miklos Szegedi commented on YARN-5600:
--------------------------------------

Thank you, [~kasha] for the review!

In general about the design: delete.debug-delay-sec is there for backward 
compatibility, I think. I do not expect many people to use it after this new 
functionality, since it affects the whole cluster.
Because of this I would not mix it's value much with 
delete.max-per-application-debug-delay-sec.
I suggest filing a separate jira to deprecate delete.debug-delay-sec for v3. 
This would make the design of this feature simpler.

{quote}
ApplicationConstants: typo in the javadoc.
{quote}
Fixed.
{quote}
YarnConfiguration: max-per-application-debug-delay-sec
"per-application" is redundant and makes the config long. Can we just have 
max-debug-delay-sec or d
debug-delay-sec.max.
{quote}
[~vvasudev], You suggested originally this value along with differentiating the 
default and the maximum. I am okay with changing it back, what is your opinion?
{quote}
I don't see the need to have a separate constant for the corresponding default 
value. Can we use the
value of delete.debug-delay-sec or a multiple of it (say, 2) as default instead?
{quote}
I do not think it is a good idea to set a default to a variable value. I could 
not even write anything to yarn-default.xml in this case. Moreover the default 
is 0 in 99%+ of the cases.
I would expect people will use the new feature more often in the future, since 
it does not affect the whole cluster. delete.debug-delay-sec is there for 
backward compatibility. Can we keep it separate than delete.debug-delay-sec?
{quote}
Also, looks like max-debug-delay-sec could be smaller than debug-delay-sec. 
That is confusing. Can we make sure max is actually max?
{quote}
I have to defer this to [~vvasudev], since it was his suggestion originally.
{quote}
Move the javadoc from ApplicationImpl to Application. Also, can we linkify 
Application.NEVER.
{quote}
I added it to both.
{quote}
If the admin sets debug-delay-sec to a positive value (say, 5 mins) and the 
user explicitly sets it to a smaller value (even 0), shouldn't the user's 
setting override for all containers of that app?
{quote}
The idea is that the standard value is 0 in 99%+ of cases. Sometimes the user 
or the admin wants to delay that by specifying a delay value. Using the max 
fulfills both requests, there is no priority, no override. I thought this is 
the simplest what I can do to make both of them happy, if I need to specify 
both. Limiting the delay would not make much sense, since any smaller number 
what requested prevents them to collect the necessary debug information for the 
other party. There is a feature to disable the delay functionality in sensitive 
applications and that is the cluster wide max delay.
{quote}
It appears we allow setting this value on a per-container basis through 
ContainerLaunchContext, but consider only the largest value. Is there value in 
allowing specifying this on a per-container basis. Why not specify it on a 
per-app basis? Say, through ApplicationSubmissionContext?
{quote}
We consider all values specified in each CLC. However, if an application has 
multiple containers on the same node, we delete the application dir only after 
the latest CLC is deleted. The largest logic just makes sure that the 
application dir is not deleted before the containers in it are deleted. Is not 
it more flexible to specify a container specific value?
{quote}
Any specific reason for capturing the deletion time in Date instead of long? 
Isn't Date more expensive?
{quote}
Date is just a wrapper above a long value. I feel the code is more readable 
with Date than it would be otherwise. You can for example get the real time 
value in the debugger without any additional calculations.
{quote}
yarn-default: s/mumber/number
{quote}
Fixed.
{quote}
DeletionServe
scheduleFileDeletionTask takes a delay, and then adjusts it based on 
configuration. These seems error-prone. Callers of this method would assume the 
file will be deleted at the provided time. Can the caller take care of the 
adjustment?
{quote}
It depends. If this is true, it was already error prone before my change, since 
we requested a delete that happened later overridden by the default delay. All 
I do is adding a custom delay. The contract is that the task will be delayed 
for at least the amount of time specified. I was thinking about changing the 
function name or creating a separate function that reflects this design better 
that the caller would call. Honestly, scheduleFileDeletionTask should be 
refactored anyways to be private, so I just changed the documentation for now. 
What do you think? (This problem would also go away, if we deprecate default 
delay.)
{quote}
getEffectiveDelay: if the limitedDelay > maxDebugDelay, one could set 
limitedDelay to maxDebugDelay without computing Math.min.
{quote}
Fixed.


> Allow setting yarn.nodemanager.delete.debug-delay-sec on a per-application 
> basis
> --------------------------------------------------------------------------------
>
>                 Key: YARN-5600
>                 URL: https://issues.apache.org/jira/browse/YARN-5600
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Daniel Templeton
>            Assignee: Miklos Szegedi
>              Labels: oct16-medium
>         Attachments: YARN-5600.000.patch, YARN-5600.001.patch, 
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch, 
> YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch, 
> YARN-5600.008.patch, YARN-5600.009.patch, YARN-5600.010.patch, 
> YARN-5600.011.patch, YARN-5600.012.patch, YARN-5600.013.patch, 
> YARN-5600.014.patch, YARN-5600.015.patch, YARN-5600.016.patch, 
> YARN-5600.017.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a 
> parameter to the CLC to allow an application owner to request delayed 
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where 
> ATS is not in use, e.g. branch-2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to