dongjoon-hyun commented on a change in pull request #33811:
URL: https://github.com/apache/spark/pull/33811#discussion_r695297142
##########
File path:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
##########
@@ -203,26 +203,36 @@ class HadoopMapReduceCommitProtocol(
}
if (dynamicPartitionOverwrite) {
- val partitionPaths = allPartitionPaths.foldLeft(Set[String]())(_ ++ _)
- logDebug(s"Clean up default partition directories for overwriting:
$partitionPaths")
- for (part <- partitionPaths) {
- val finalPartPath = new Path(path, part)
- if (!fs.delete(finalPartPath, true) &&
!fs.exists(finalPartPath.getParent)) {
- // According to the official hadoop FileSystem API spec, delete op
should assume
- // the destination is no longer present regardless of return
value, thus we do not
- // need to double check if finalPartPath exists before rename.
- // Also in our case, based on the spec, delete returns false only
when finalPartPath
- // does not exist. When this happens, we need to take action if
parent of finalPartPath
- // also does not exist(e.g. the scenario described on
SPARK-23815), because
- // FileSystem API spec on rename op says the rename
dest(finalPartPath) must have
- // a parent that exists, otherwise we may get unexpected result on
the rename.
- fs.mkdirs(finalPartPath.getParent)
- }
- val stagingPartPath = new Path(stagingDir, part)
- if (!fs.rename(stagingPartPath, finalPartPath)) {
- throw new IOException(s"Failed to rename $stagingPartPath to
$finalPartPath when " +
+ val targetPath = new Path(path)
+ val pathExisted = fs.exists(targetPath)
+ if (!pathExisted || fs.listStatus(targetPath).isEmpty) {
+ if ((!pathExisted || (pathExisted && fs.delete(targetPath, true))) &&
+ !fs.rename(stagingDir, targetPath)) {
Review comment:
indentation?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]