[
https://issues.apache.org/jira/browse/YARN-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663130#comment-15663130
]
Varun Vasudev commented on YARN-5600:
-------------------------------------
Thanks for the patch [[email protected]]! My apologies for the late
comments -
1)
{code}
+ String value = environment
+ .get(YarnConfiguration.DEBUG_NM_CUSTOM_DELETE_DELAY_SEC);
{code}
The environment variable name shouldn’t come from YarnConfiguration, which is
meant for entries in yarn-site.xml. Please add the variable to the Environment
class in ApplicationConstants.java
2)
{code}
+ /** Delay before deleting resource to ease debugging of NM issues.
+ * This is a custom key for the environment map in
+ * container launch context*/
+ public static final String DEBUG_NM_CUSTOM_DELETE_DELAY_SEC =
+ YarnConfiguration.NM_PREFIX + "delete.custom-debug-delay-sec";
+ public static final int DEFAULT_DEBUG_NM_CUSTOM_DELETE_DELAY_SEC = 0;
{code}
Not required
3)
{code}
+ /** Maximum delay before deleting the resource.
+ * This setting limits YarnConfiguration.DEFAULT_DEBUG_NM_DELETE_DELAY_SEC
+ * and any container specific delay as well for reliability and security*/
+ public static final String DEBUG_NM_MAX_DELETE_DELAY_SEC =
+ YarnConfiguration.NM_PREFIX + "delete.max-debug-delay-sec";
+ public static final int DEFAULT_DEBUG_NM_MAX_DELETE_DELAY_SEC =
+ -1;
{code}
{code}
+ <property>
+ <description>
+ This is a maximum cap on the following values
+ yarn.nodemanager.delete.debug-delay-sec
+ yarn.nodemanager.delete.custom-debug-delay-sec
+
+ Maximum mumber of seconds after an application finishes before the
nodemanager's
+ DeletionService will delete the application's localized file directory
+ and log directory. The value -1 means let the service keep the files
forever,
+ if requested by any of the settings above.
+ </description>
+ <name>yarn.nodemanager.delete.max-debug-delay-sec</name>
+ <value>-1</value>
+ </property>
{code}
The default value for this can’t be -1. This means that users can keep
resources around forever without admins knowing about it. The default value
should be 0. Admins should be required to explicitly set the value to a number
they’re comfortable with.
4)
To a larger point, I don’t think we should provide a keep forever option at
all. Admins can set the value to a really high number if they desire. The idea
behind the feature is to help debugging which implies a reasonable time
window(a few days to a week?) - not sure why we need to keep artifacts around
forever.
5)
{code}
+ if (debugDelaySec == KEEP_FOREVER &&
+ debugDelayDefaultMax != KEEP_FOREVER) {
+ LOG.warn(String.format("Limiting debug delay %d to %d",
+ debugDelaySec, debugDelayDefaultMax));
+ limitedDelay = debugDelayDefaultMax;
+ } else if (debugDelaySec > debugDelayDefaultMax &&
+ debugDelayDefaultMax != KEEP_FOREVER) {
+ LOG.warn(String.format("Limiting debug delay %d to %d",
+ debugDelaySec, debugDelayDefaultMax));
+ limitedDelay = Math.min(debugDelaySec, debugDelayDefaultMax);
+ }
{code}
Not sure these should be logged as warnings - info might be fine?
6)
{code}
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertNull;
-import static org.junit.Assert.assertTrue;
{code}
Any particular reason for changing the location of the import statements?
7)
{code}
+ private static final FileContext LFS = getLfs();
+ private static FileContext getLfs() {
{code}
{code}
- private static final Path base =
- lfs.makeQualified(new Path("target", TestDeletionService.class.getName()));
{code}
Similar to above - any reason for renaming the variables?
> Add a parameter to ContainerLaunchContext to emulate
> yarn.nodemanager.delete.debug-delay-sec on a per-application basis
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-5600
> URL: https://issues.apache.org/jira/browse/YARN-5600
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Affects Versions: 3.0.0-alpha1
> Reporter: Daniel Templeton
> Assignee: Miklos Szegedi
> Labels: oct16-medium
> Attachments: YARN-5600.000.patch, YARN-5600.001.patch,
> YARN-5600.002.patch, YARN-5600.003.patch, YARN-5600.004.patch,
> YARN-5600.005.patch, YARN-5600.006.patch, YARN-5600.007.patch,
> YARN-5600.008.patch, YARN-5600.009.patch
>
>
> To make debugging application launch failures simpler, I'd like to add a
> parameter to the CLC to allow an application owner to request delayed
> deletion of the application's launch artifacts.
> This JIRA solves largely the same problem as YARN-5599, but for cases where
> ATS is not in use, e.g. branch-2.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]