To answer the question on the configuration of Spark 4.0.0-RC2, this is spark-defaults.conf used in the benchmark. Any suggestion on adding or changing configuration values will be appreciated.
spark.driver.cores=36 spark.driver.maxResultSize=0 spark.driver.memory=196g spark.dynamicAllocation.enabled=false spark.executor.cores=36 spark.executor.instances=12 spark.executor.memory=196g spark.executor.memoryOverhead=20g spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 spark.hadoop.yarn.timeline-service.enabled=false spark.shuffle.io.clientThreads=32 spark.shuffle.io.serverThreads=1 spark.shuffle.service.enabled=true spark.shuffle.service.name=spark2_shuffle spark.sql.adaptive.enabled=true spark.sql.files.maxPartitionBytes=256MB spark.sql.files.minPartitionNum=50 spark.sql.hive.convertMetastoreOrc=true spark.sql.orc.enableVectorizedReader=true spark.sql.orc.impl=native spark.task.cpus=1 spark.yarn.populateHadoopClasspath=false spark.sql.hive.metastore.jars=maven spark.sql.hive.metastore.version=4.0.0 spark.executorEnv.JAVA_HOME=/home/hive/jdk-21.0.1+12 spark.executor.extraJavaOptions -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+UseNUMA -XX:InitiatingHeapOccupancyPercent=40 -XX:G1ReservePercent=20 -XX:MaxGCPauseMillis=200 -server spark.yarn.appMasterEnv.JAVA_HOME=/home/hive/jdk-21.0.1+12 spark.sql.ansi.enabled=false spark.sql.autoBroadcastJoinThreshold=500000000 spark.sql.join.preferSortMergeJoin=true On Tue, Apr 22, 2025 at 7:08 PM Sungwoo Park <glap...@gmail.com> wrote: > Hello, > > We published a blog that reports the performance evaluation of Trino 468, > Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS Benchmark, 10TB > scale factor. Hope you find it useful. > > https://mr3docs.datamonad.com/blog/2025-04-18-performance-evaluation-2.0 > > --- Sungwoo >