[GitHub] [spark] zzzzming95 commented on a diff in pull request #41000: [SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write`

via GitHub Thu, 11 May 2023 08:38:44 -0700


zzzzming95 commented on code in PR #41000:
URL: https://github.com/apache/spark/pull/41000#discussion_r1191365410



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala:
##########
@@ -159,6 +159,17 @@ object FileFormatWriter extends Logging {
       statsTrackers = statsTrackers
     )
 
+    SQLExecution.checkSQLExecutionId(sparkSession)
+
+    // propagate the description UUID into the jobs, so that committers
+    // get an ID guaranteed to be unique.
+    job.getConfiguration.set("spark.sql.sources.writeJobUUID", 
description.uuid)
+
+    // This call shouldn't be put into the `try` block below because it only 
initializes and
+    // prepares the job, any exception thrown from here shouldn't cause 
abortJob() to be called.
+    // It must be run before `materializeAdaptiveSparkPlan()`

Review Comment:
   >  What is the fallout of committer.setupJob(job) not being executed in 
presence of an error?
   
   Spark will delete partition location  when running `insert overwrite` . 
   
   https://github.com/apache/spark/pull/41000#issuecomment-1543974004
   
   And it will create new  location in `committer.setupJob(job)` , then execute 
the job. But in https://github.com/apache/spark/pull/38358 , we triggered the 
job execution in advance . 
   
   So when the job execute failed , the location path would be delete and no 
create .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zzzzming95 commented on a diff in pull request #41000: [SPARK-43327] Trigger `committer.setupJob` before plan execute in `FileFormatWriter#write`

Reply via email to