[GitHub] [spark] GabeChurch commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

GitBox Thu, 03 Mar 2022 21:25:45 -0800


GabeChurch commented on pull request #32518:
URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240



    @dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 
committer with ORC? 
    I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (couple 
TB writes) for awhile now and seeing worse performance when enabling the magic 
s3 committer. Probably worth noting that I'm partitioning, bucketing (1 col), 
and sorting on write. 
    
    Is magic committer simply a bad option for those of us utilizing ORC? Or 
maybe I'm missing something. 
   
   Property | Option
   -- | --
   spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled | true
   spark.hadoop.fs.s3a.committer.magic.enabled | true
   spark.hadoop.fs.s3a.committer.name | magic
   spark.hadoop.fs.s3a.experimental.input.fadvise | random
   spark.hadoop.fs.s3a.impl | org.apache.hadoop.fs.s3a.S3AFileSystem
   spark.hadoop.fs.s3a.readahead.range | 157810688
   spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version | 2
   spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a | 
org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
   spark.sql.hive.metastorePartitionPruning | True
   spark.sql.orc.filterPushdown | True
   spark.sql.parquet.output.committer.class | 
org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter
   spark.sql.sources.commitProtocolClass | 
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] GabeChurch commented on pull request #32518: [SPARK-35383][CORE] Improve s3a magic committer support by inferring missing configs

Reply via email to