weixiuli commented on a change in pull request #35492:
URL: https://github.com/apache/spark/pull/35492#discussion_r805112929



##########
File path: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
##########
@@ -104,7 +104,7 @@ class HadoopMapReduceCommitProtocol(
    * The staging directory of this write job. Spark uses it to deal with files 
with absolute output
    * path, or writing data into partitioned directory with 
dynamicPartitionOverwrite=true.
    */
-  protected def stagingDir = getStagingDir(path, jobId)
+  @transient protected lazy val stagingDir = getStagingDir(path, jobId)

Review comment:
       The stagingDir method will be called many times  in commitJob, 
especially in traversing partitionPaths when the dynamicPartitionOverwrite is 
true.  So, we should use a stagingDir constant instead of the  stagingDir 
method to avoid multiple function calls.
   
https://github.com/apache/spark/blob/25a4c5fa84d64e37cf5c27c7b2f0f29867330bf2/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L218-L236




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to