I spent some time over the past couple of years making micro optimizations
within Avro, Parquet, ORC.

Curious to know if there's a way for you all to get timings at different
levels of the stack to compare and not just look at the top-line numbers. A
further breakdown could also help identify areas for improvement.

Thanks.

On Sat, Jan 7, 2023, 8:23 PM ypeng <yp...@t-online.de> wrote:

> [image: image.png]
>
> from your posting, the result is amazing. glad to know hive on mr3 has
> that nice performance.
>
> regards.
>
>
> On Sat, Jan 7, 2023 at 11:29 PM Sungwoo Park <glap...@gmail.com> wrote:
>
>> In fact, Hive 3 has been much faster than Spark for a long time. For
>> complex queries, Hive 3 is much faster than Presto (or Trino) as well. The
>> reality is different from common beliefs on Hive, Spark, and Presto. If
>> interested, see the result of performance comparison using the TPC-DS
>> benchmark.
>>
>> Performance comparison in October 2018:
>> https://www.datamonad.com/post/2018-10-30-performance-evaluation-0.4/
>>
>> Performance comparison in April 2022:
>> https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/
>>
>> Sungwoo
>>
>>
>> On Fri, Jan 6, 2023 at 12:35 PM ypeng <yp...@t-online.de> wrote:
>>
>>> Hello,
>>>
>>> Just from my personal testing, Hive 3.1.3 has much better performance
>>> than the old ones.
>>> It's even as fast as Spark by using the default mr engine.
>>> My test process and dataset,
>>>
>>> https://blog.crypt.pw/Another-10-million-dataset-testing-for-Spark-and-Hive
>>>
>>> Thanks.
>>>
>>

Reply via email to