To answer the question on the configuration of Spark 4.0.0-RC2, this
is spark-defaults.conf used in the benchmark. Any suggestion on adding or
changing configuration values will be appreciated.

spark.driver.cores=36
spark.driver.maxResultSize=0
spark.driver.memory=196g
spark.dynamicAllocation.enabled=false
spark.executor.cores=36
spark.executor.instances=12
spark.executor.memory=196g
spark.executor.memoryOverhead=20g
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2
spark.hadoop.yarn.timeline-service.enabled=false
spark.shuffle.io.clientThreads=32
spark.shuffle.io.serverThreads=1
spark.shuffle.service.enabled=true
spark.shuffle.service.name=spark2_shuffle
spark.sql.adaptive.enabled=true
spark.sql.files.maxPartitionBytes=256MB
spark.sql.files.minPartitionNum=50
spark.sql.hive.convertMetastoreOrc=true
spark.sql.orc.enableVectorizedReader=true
spark.sql.orc.impl=native
spark.task.cpus=1

spark.yarn.populateHadoopClasspath=false
spark.sql.hive.metastore.jars=maven
spark.sql.hive.metastore.version=4.0.0

spark.executorEnv.JAVA_HOME=/home/hive/jdk-21.0.1+12
spark.executor.extraJavaOptions   -XX:+AlwaysPreTouch -XX:+UseG1GC
-XX:+UseNUMA -XX:InitiatingHeapOccupancyPercent=40 -XX:G1ReservePercent=20
-XX:MaxGCPauseMillis=200 -server

spark.yarn.appMasterEnv.JAVA_HOME=/home/hive/jdk-21.0.1+12
spark.sql.ansi.enabled=false

spark.sql.autoBroadcastJoinThreshold=500000000
spark.sql.join.preferSortMergeJoin=true

On Tue, Apr 22, 2025 at 7:08 PM Sungwoo Park <glap...@gmail.com> wrote:

> Hello,
>
> We published a blog that reports the performance evaluation of Trino 468,
> Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS Benchmark, 10TB
> scale factor. Hope you find it useful.
>
> https://mr3docs.datamonad.com/blog/2025-04-18-performance-evaluation-2.0
>
> --- Sungwoo
>

Reply via email to