Re:Re: Performance evaluation of Trino 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3

lisoda Tue, 22 Apr 2025 02:16:22 -0700

Hello,Okumin.

Currently, from the TPCDS test results, we observe that, in some scenarios, 
neither HIVE nor HIVE ON MR3 seems to be able to achieve satisfactory 
performance. It appears that HIVE4 has not generated a good execution plan.

Generally, the typical scenarios for these problematic queries are: query 97, 
query 29, query 75, query 11, query 65, query 4, and query 78.

Among them, Trino has recently introduced some optimizations. The execution 
time for query 11 and query 4 has decreased by nearly half compared to Trino 
440. Is it possible for us to transplant Trino's optimization strategies or 
investigate the reasons for the poor execution plans of these typical SQL 
queries?

Tks.
Lisoda.

在 2025-04-22 16:37:29，"Sungwoo Park" <glap...@gmail.com> 写道：

From average response time analysis:

For Spark, it performs better than its total execution time suggests, with an 
average response time significantly lower than Hive on Tez.

For long-running complex queries (like query 24) on large datasets, Hive on Tez 
can be a better choice than Spark, even with its initial overhead of starting 
YARN containers.

--- Sungwoo

On Tue, Apr 22, 2025 at 2:52 PM ypeng <yp...@t-online.de> wrote:

Thanks for the doc.
I am surprised to see spark 4 is even slower than hive on Tez.

[Total Execution Time (Sequential). Trino is the fastest, followed
closely by Hive on MR3, which significantly outperformed Hive on Tez.
Spark is the slowest, skewed by a few outlier queries.]

Sungwoo Park:
> We published a blog that reports the performance evaluation of Trino
> 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS
> Benchmark, 10TB scale factor. Hope you find it useful.

Re:Re: Performance evaluation of Trino 468, Spark 4.0.0-RC2, and Hive 4 on Tez/MR3

Reply via email to