I spent some time over the past couple of years making micro optimizations within Avro, Parquet, ORC.
Curious to know if there's a way for you all to get timings at different levels of the stack to compare and not just look at the top-line numbers. A further breakdown could also help identify areas for improvement. Thanks. On Sat, Jan 7, 2023, 8:23 PM ypeng <yp...@t-online.de> wrote: > [image: image.png] > > from your posting, the result is amazing. glad to know hive on mr3 has > that nice performance. > > regards. > > > On Sat, Jan 7, 2023 at 11:29 PM Sungwoo Park <glap...@gmail.com> wrote: > >> In fact, Hive 3 has been much faster than Spark for a long time. For >> complex queries, Hive 3 is much faster than Presto (or Trino) as well. The >> reality is different from common beliefs on Hive, Spark, and Presto. If >> interested, see the result of performance comparison using the TPC-DS >> benchmark. >> >> Performance comparison in October 2018: >> https://www.datamonad.com/post/2018-10-30-performance-evaluation-0.4/ >> >> Performance comparison in April 2022: >> https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/ >> >> Sungwoo >> >> >> On Fri, Jan 6, 2023 at 12:35 PM ypeng <yp...@t-online.de> wrote: >> >>> Hello, >>> >>> Just from my personal testing, Hive 3.1.3 has much better performance >>> than the old ones. >>> It's even as fast as Spark by using the default mr engine. >>> My test process and dataset, >>> >>> https://blog.crypt.pw/Another-10-million-dataset-testing-for-Spark-and-Hive >>> >>> Thanks. >>> >>