[
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604503#comment-14604503
]
zhihai xu commented on YARN-3017:
---------------------------------
I just found this change may cause problem in LogAggregation during rolling
upgrade with NM-Recovery-supervised enabled.
The following code in
{{AggregatedLogFormat#getPendingLogFilesToUploadForThisContainer}} will upload
the log based on the containerId String. So we may miss uploading the old log
files after upgrade.
{code}
File containerLogDir =
new File(appLogDir, ConverterUtils.toString(this.containerId));
if (!containerLogDir.isDirectory()) {
continue; // ContainerDir may have been deleted by the user.
}
pendingUploadFiles
.addAll(getPendingLogFilesToUpload(containerLogDir));
{code}
To support this issue, we also need make change in
{{getPendingLogFilesToUploadForThisContainer}} to compare containerId using
{{ContainerId#fromString}}.
It looks like it makes sense to keep the old format for compatibility.
> ContainerID in ResourceManager Log Has Slightly Different Format From
> AppAttemptID
> ----------------------------------------------------------------------------------
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
> Issue Type: Improvement
> Affects Versions: 2.8.0
> Reporter: MUFEED USMAN
> Assignee: Mohammad Shahid Khan
> Priority: Minor
> Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch,
> YARN-3017_3.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_000002
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_000002".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting
> up
> container Container: [ContainerId: container_1412150883650_0001_02_000001,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource: <memory:2048,
> vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_000002
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)