turboFei opened a new pull request #28989:
URL: https://github.com/apache/spark/pull/28989
### What changes were proposed in this pull request?
For dynamic partition overwrite, its working dir is `.spark-staging-{jobId}`.
Task file name formatted `part-$taskId-$jobId$ext`(regardless task attempt
Id).
Each task writes its output to:
- `.spark-staging-{jobId}/partitionPath1/taskFileName1`
- `.spark-staging-{jobId}/partitionPath2/taskFileName2`
- ...
- `.spark-staging-{jobId}/partitionPathN/taskFileNameN`
If speculation is enabled, there may be several tasks, which have same
taskId and different attemptId, write to the same files concurrently.
For distributedFileSystem, it only allow one task to hold the lease to write
a file, if two tasks want to write the same file, an exception like `no lease
on inode` would be thrown.
Even speculation is not enabled, if a task aborted due to Executor OOM, its
output would not be cleaned up.
Then a new task launched to write the same file, because parquet disallows
overwriting, a `FileAlreadyExistsException` would be thrown, like.
```
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException:
/user/hive/warehouse/t2/.spark-staging-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1/part1=2/part2=2/part-00000-1f1efbfd-7e20-4e0f-a49c-a7fa3eae4cb1.c000.snappy.parquet
for client 127.0.0.1 already exists
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2578)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2465)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2349)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:624)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:398)
```
It is a critical issue and would cause job failed.
In this PR, we fix this issue with the solution below:
1. set a working path under staging dir named partitionPath-attemptId.
2. after task completed, rename partitionPath-attemptId/fileName to
partitionPath/fileName
### Why are the changes needed?
Without this PR, dynamic partition overwrite operation might fail.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added UT.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]