Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/18714#discussion_r159352867
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -39,8 +39,19 @@ import org.apache.spark.mapred.SparkHadoopMapRedUtil
*
* @param jobId the job's or stage's id
* @param path the job's output path, or null if committer acts as a noop
+ * @param dynamicPartitionOverwrite If true, Spark will overwrite
partition directories at runtime
+ * dynamically, i.e., we first write
files under a staging
+ * directory with partition path, e.g.
+ * /path/to/staging/a=1/b=1/xxx.parquet.
When committing the job,
+ * we first clean up the corresponding
partition directories at
+ * destination path, e.g.
/path/to/destination/a=1/b=1, and move
+ * files from staging directory to the
corresponding partition
+ * directories under destination path.
*/
-class HadoopMapReduceCommitProtocol(jobId: String, path: String)
+class HadoopMapReduceCommitProtocol(
+ jobId: String,
+ path: String,
+ dynamicPartitionOverwrite: Boolean = false)
--- End diff --
Indents.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]