Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20931#discussion_r179651049
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -186,7 +186,9 @@ class HadoopMapReduceCommitProtocol(
logDebug(s"Clean up default partition directories for overwriting:
$partitionPaths")
for (part <- partitionPaths) {
val finalPartPath = new Path(path, part)
- fs.delete(finalPartPath, true)
+ if (!fs.delete(finalPartPath, true) &&
!fs.exists(finalPartPath.getParent)) {
--- End diff --
I feel the code here is not safe. We may fail to delete if `finalPartPath`
doesn't exist, or there are some real failures. We should make sure
`finalPartPath` doesn't exist before renaming.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]