GabeChurch commented on pull request #32518: URL: https://github.com/apache/spark/pull/32518#issuecomment-1058840240
@dongjoon-hyun I'm curious, have you done any benchmarks for the magic s3 committer with ORC? I've been testing with Spark 3.2 major and 3.3 fork on Kubernetes (couple TB writes) for awhile now and seeing worse performance when enabling the magic s3 committer. Probably worth noting that I'm partitioning, bucketing (1 col), and sorting on write. Is magic committer simply a bad option for those of us utilizing ORC? Or maybe I'm missing something. Property | Option -- | -- spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled | true spark.hadoop.fs.s3a.committer.magic.enabled | true spark.hadoop.fs.s3a.committer.name | magic spark.hadoop.fs.s3a.experimental.input.fadvise | random spark.hadoop.fs.s3a.impl | org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3a.readahead.range | 157810688 spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version | 2 spark.hadoop.mapreduce.outputcommitter.factory.scheme.s3a | org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory spark.sql.hive.metastorePartitionPruning | True spark.sql.orc.filterPushdown | True spark.sql.parquet.output.committer.class | org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter spark.sql.sources.commitProtocolClass | org.apache.spark.internal.io.cloud.PathOutputCommitProtocol -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org