AngersZhuuuu commented on pull request #33828:
URL: https://github.com/apache/spark/pull/33828#issuecomment-1066481822


   > oh, also, I'm thinking of making some gcs enhancements which turn off some 
checks under __temporary/ paths, breaking "strict" fs semantics but delivering 
performance through reduced io
   > 
   > * skipping all overwrite/parent is dir/dest is not a directory checks when 
creating a file
   > * not worrying about recreating parent dir markers after renaming or 
deleting files
   >   ... etc. S3A will do the same under paths with `__magic` an element 
above it, saves a HEAD and a LIST for every parquet file written (it sets 
overwrite=false when creating files, for no reason at all)
   > 
   > so you should always use _temporary as one path element in your staging 
dir to get any of those benefits
   
   Having checked how to use mainfest commit protocol, but I found a problem is 
that
   ```
   class PathOutputCommitProtocol(
       jobId: String,
       dest: String,
       dynamicPartitionOverwrite: Boolean = false)
     extends HadoopMapReduceCommitProtocol(jobId, dest, false) with 
Serializable {
   
     if (dynamicPartitionOverwrite) {
       // until there's explicit extensions to the PathOutputCommitProtocols
       // to support the spark mechanism, it's left to the individual committer
       // choice to handle partitioning.
       throw new IOException(PathOutputCommitProtocol.UNSUPPORTED)
     }
   ```
   
   In current Spark's code, dynamicPartitionOverwrite can't support this, it 
means we can't use your feature in case of dynamic partition overwriting .
   
   
   We need to do some change to support this. WDYT cc @steveloughran 
@HyukjinKwon @cloud-fan @viirya 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to