wy created YARN-11962:
-------------------------
Summary: LogAggregationIndexedFileController.write() does not
track uploaded files, breaking local cleanup and dedup in rolling log
aggregation
Key: YARN-11962
URL: https://issues.apache.org/jira/browse/YARN-11962
Project: Hadoop YARN
Issue Type: Bug
Components: log-aggregation, yarn
Affects Versions: 3.4.3, 3.5.0, 3.3.6, 3.2.4, 3.1.4, 3.0.3, 2.9.2
Environment: * OS: Ubuntu 24.04 (WSL2)
* Java: OpenJDK 21
* Hadoop: 3.4.2
* Cluster: Single-node (localhost), pseudo-distributed mode
Reporter: wy
h3. Problem
When YARN rolling log aggregation is enabled with the {{IndexedFormat}} file
controller ({{{}LogAggregationIndexedFileController{}}}), two features are
broken:
# {*}Local log file cleanup{*}: Uploaded local log files are never deleted
during rolling aggregation cycles, even when
{{yarn.log-aggregation.enable-local-cleanup=true}} (the default).
# {*}Upload deduplication{*}: The same log files are re-uploaded in every
rolling cycle, causing HDFS storage waste proportional to {{{}(number_of_cycles
× total_log_size){}}}.
Both features work correctly when using {{TFile}} format.
h3. Root Cause
{{LogAggregationIndexedFileController.write()}} (line 360–424) never calls
{{logValue.uploadedFiles.add(logFile)}} after successfully writing a log file
to the aggregated output. In contrast, the TFile write path
({{{}AggregatedLogFormat.LogValue.write(){}}}, line 287) correctly calls
{{this.uploadedFiles.add(logFile)}} after each successful write.
Because {{uploadedFiles}} is never populated in the IndexedFormat path:
* {{LogValue.getCurrentUpLoadedFilesPath()}} always returns an empty set
* {{LogValue.getCurrentUpLoadedFileMeta()}} always returns an empty set
This cascades into {{{}AppLogAggregatorImpl.uploadLogsForContainers(){}}}:
{code:java}
Set<Path> uploadedFilePathsInThisCycle =
aggregator.doContainerLogAggregation(...);
// Returns Sets.union(getCurrentUpLoadedFilesPath() /*empty*/,
// getObsoleteRetentionLogFiles() /*usually empty*/)
// = empty set
if (uploadedFilePathsInThisCycle.size() > 0) { // always false
// Local deletion logic is NEVER entered
deletionTask = new FileDeletionTask(...); // never created
}
{code}
And in {{{}ContainerLogAggregator.doContainerLogAggregation(){}}}:
{code:java}
this.uploadedFileMeta.addAll(
logValue.getCurrentUpLoadedFileMeta()); // addAll(empty) → no change
// → alreadyUploadedLogFiles stays empty
// → dedup filter passes all files
// → same files re-uploaded every cycle
{code}
h3. Code Comparison
*TFile path (correct)* — {{{}AggregatedLogFormat.LogValue.write(){}}}:
{code:java}
for (File logFile : fileList) {
// ... write bytes ...
this.uploadedFiles.add(logFile); // ← tracks uploaded file
}
{code}
*IndexedFormat path (buggy)* —
{{{}LogAggregationIndexedFileController.write(){}}}:
{code:java}
for (File logFile : pendingUploadFiles) {
// ... write bytes ...
// ← missing: logValue.uploadedFiles.add(logFile)
metas.add(meta); // only IndexedFileLogMeta is tracked, not uploadedFiles
}
{code}
h3. Reproduction Steps
*Prerequisites:*
* Hadoop 3.4.2 single-node cluster (HDFS + YARN)
* TRACE logging for {{AppLogAggregatorImpl}} in
{{{}$HADOOP_HOME/etc/hadoop/log4j.properties{}}}:
{noformat}
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl=TRACE
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor=DEBUG
{noformat}
*Step 1 — IndexedFormat test:*
1. Configure {{{}yarn-site.xml{}}}:
{code:xml}
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.file-formats</name>
<value>IndexedFormat</value>
</property>
<property>
<name>yarn.log-aggregation.file-controller.IndexedFormat.class</name>
<value>org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController</value>
</property>
<property>
<name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
<value>15</value>
</property>
{code}
2. Submit a DistributedShell app that writes to stderr for 90 seconds:
{code:bash}
# test-app.sh
#!/bin/bash
for i in $(seq 1 90); do echo "line_$i" >&2; sleep 1; done
{code}
{code:bash}
yarn jar hadoop-yarn-applications-distributedshell-*.jar \
org.apache.hadoop.yarn.applications.distributedshell.Client \
--jar hadoop-yarn-applications-distributedshell-*.jar \
--shell_script test-app.sh --shell_args "90" \
--num_containers 1 --container_memory 256 --master_memory 256 \
--rolling_log_pattern "stderr"
{code}
3. After completion, check NM log:
{code:bash}
grep "$APP_ID" $NM_LOG | grep "Uploaded the following files"
# Expected: multiple TRACE entries (one per cycle per container)
# Actual: 0 entries
grep "$APP_ID" $NM_LOG | grep "Deleting path.*stderr"
# Expected: deletion events for stderr file during rolling cycles
# Actual: 0 events (only whole-directory deletion at app finish)
{code}
*Step 2 — TFile control (same config, only change format):*
# Change {{yarn.log-aggregation.file-formats}} to {{{}TFile{}}}.
# Repeat the same test.
# Now NM log shows TRACE "Uploaded the following files" entries *and*
"Deleting path" events for stderr during rolling cycles.
h3. Expected Behavior
When using IndexedFormat with rolling log aggregation and
{{{}enable-local-cleanup=true{}}}:
* After each rolling cycle, uploaded local log files should be deleted
* Already-uploaded files should not be re-uploaded in subsequent cycles (dedup
via {{alreadyUploadedLogFiles}} metadata tracking)
h3. Actual Behavior
* No local log files are ever deleted during rolling cycles (only at app
finish via {{{}doAppLogAggregationPostCleanUp{}}})
* Same files are re-uploaded in every rolling cycle (no dedup)
* Zero "Uploaded the following files" TRACE log entries (code path never
entered)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]