Hello Sungwoo BTW, would you consider adding HIVE4-LLAP as a control group for the trial? Tks. Lisoda
在 2025-04-22 16:37:29,"Sungwoo Park" <glap...@gmail.com> 写道: From average response time analysis: For Spark, it performs better than its total execution time suggests, with an average response time significantly lower than Hive on Tez. For long-running complex queries (like query 24) on large datasets, Hive on Tez can be a better choice than Spark, even with its initial overhead of starting YARN containers. --- Sungwoo On Tue, Apr 22, 2025 at 2:52 PM ypeng <yp...@t-online.de> wrote: Thanks for the doc. I am surprised to see spark 4 is even slower than hive on Tez. [Total Execution Time (Sequential). Trino is the fastest, followed closely by Hive on MR3, which significantly outperformed Hive on Tez. Spark is the slowest, skewed by a few outlier queries.] Sungwoo Park: > We published a blog that reports the performance evaluation of Trino > 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS > Benchmark, 10TB scale factor. Hope you find it useful.