AngersZhuuuu commented on code in PR #53406:
URL: https://github.com/apache/spark/pull/53406#discussion_r2645107472
##########
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala:
##########
@@ -178,6 +178,19 @@ class HadoopMapReduceCommitProtocol(
val taskAttemptContext = new
TaskAttemptContextImpl(jobContext.getConfiguration, taskAttemptId)
committer = setupCommitter(taskAttemptContext)
committer.setupJob(jobContext)
+ try {
+ if (dynamicPartitionOverwrite) {
Review Comment:
`stagingDir` is used in the following situations:
- When there is an output file with an absolute path (via
`newTaskTempFileAbsPath`), it is unrelated to `dynamicPartitionOverwrite`.
- When `dynamicPartitionOverwrite=true`, it is used for the final movement
of partition files.
When `dynamicPartitionOverwrite=false`, stagingDir won't created as
outputPath after `committer.setupJob(jobContext)`. so here call
`fs.deleteOnExit(stagingDir)` may throw exception since
deleteOnExit() will check file exists.
```
public boolean deleteOnExit(Path f) throws IOException {
if (!this.exists(f)) {
return false;
} else {
synchronized(this.deleteOnExit) {
this.deleteOnExit.add(f);
return true;
}
}
}
```
When `stagingDir` using for absolute path (via `newTaskTempFileAbsPath`), it
was created by executor and we can't call `fs.deleteOnExit(stagingDir)` since
this file was committer in driver.
So after current PR, `stagingDir` used for absolute path (via
`newTaskTempFileAbsPath`) still may remain staging path. **(This kind of
scenario is usually very rare.)**
What we can do is we can first create `stagingDr` here and add it to
`deleteOnExit` path. Then call cover all case.
And this path always delete after `commitJob` or `abortJob`
<img width="842" height="435" alt="截屏2025-12-24 16 21 21"
src="https://github.com/user-attachments/assets/e76da80b-50cb-413d-8498-d35e3655c29e"
/>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]