Hello,Okumin. Currently, from the TPCDS test results, we observe that, in some scenarios, neither HIVE nor HIVE ON MR3 seems to be able to achieve satisfactory performance. It appears that HIVE4 has not generated a good execution plan.
Generally, the typical scenarios for these problematic queries are: query 97, query 29, query 75, query 11, query 65, query 4, and query 78. Among them, Trino has recently introduced some optimizations. The execution time for query 11 and query 4 has decreased by nearly half compared to Trino 440. Is it possible for us to transplant Trino's optimization strategies or investigate the reasons for the poor execution plans of these typical SQL queries? Tks. Lisoda. 在 2025-04-22 16:37:29,"Sungwoo Park" <glap...@gmail.com> 写道: From average response time analysis: For Spark, it performs better than its total execution time suggests, with an average response time significantly lower than Hive on Tez. For long-running complex queries (like query 24) on large datasets, Hive on Tez can be a better choice than Spark, even with its initial overhead of starting YARN containers. --- Sungwoo On Tue, Apr 22, 2025 at 2:52 PM ypeng <yp...@t-online.de> wrote: Thanks for the doc. I am surprised to see spark 4 is even slower than hive on Tez. [Total Execution Time (Sequential). Trino is the fastest, followed closely by Hive on MR3, which significantly outperformed Hive on Tez. Spark is the slowest, skewed by a few outlier queries.] Sungwoo Park: > We published a blog that reports the performance evaluation of Trino > 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS > Benchmark, 10TB scale factor. Hope you find it useful.