[
https://issues.apache.org/jira/browse/YARN-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212095#comment-16212095
]
Xiao Chen commented on YARN-7261:
---------------------------------
Thanks [~yufeigu] for creating the jira and providing a patch.
For context, Yufei and myself have seen an intermittent issue where
localization took very long. It is suspected that the copying from hdfs took
long, but HDFS metrics/logs doesn't show any smoking guns. We'd like to use
this jira to add more debugging information.
The log we collected currently looks like:
{noformat}
2017-09-15 10:55:50,738 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_e70_1505214525894_75227_01_000014
2017-09-15 10:55:50,738 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Downloading public rsrc:{
hdfs://nameservice1/cached/pub/deviceDetailsQuery_1505472717000.xml,
1505472808731, FILE, null }
...
2017-09-15 10:58:38,760 DEBUG org.apache.hadoop.yarn.util.FSDownload: Changing
permissions for path
file:/var/hdfs/5/yarn/nm/filecache/7363_tmp/deviceDetailsQuery_1505472717000.xml
to perm r-xr-xr-x
2017-09-15 10:58:38,775 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_e70_1505214525894_75227_01_000014 transitioned from
LOCALIZING to LOCALIZED
{noformat}
But no details on what happened in the 3 minutes.
The patch LGTM. 1 question:
Do you think adding a debug message to
{{ResourceLocalizationService#addResource}}, to indicate the when the following
1 & 2 conditions are false would be helpful?
{code}
/*
* Here multiple containers may request the same resource. So we need
* to start downloading only when
* 1) ResourceState == DOWNLOADING
* 2) We are able to acquire non blocking semaphore lock.
* If not we will skip this resource as either it is getting downloaded
* or it FAILED / LOCALIZED.
*/
{code}
> Add debug message in class FSDownload for better download latency monitoring
> ----------------------------------------------------------------------------
>
> Key: YARN-7261
> URL: https://issues.apache.org/jira/browse/YARN-7261
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: nodemanager
> Reporter: Yufei Gu
> Assignee: Yufei Gu
> Attachments: YARN-7261.001.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]