Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread Sungwoo Park
>
>
> [image: image.png]
>
> from your posting, the result is amazing. glad to know hive on mr3 has
> that nice performance.
>

Hive on MR3 is similar to Hive-LLAP in performance, so we can interpret the
above result as Hive being much faster than SparkSQL. For executing
concurrent queries, the performance gap is even greater. In my (rather
biased) opinion, the key weakness of Spark is 1) its poor performance when
executing concurrent queries and 2) its poor resource utilization when
executing multiple Spark applications concurrently.

We released Hive on MR3 1.6 a couple of weeks ago. Now we have backported
about 700 patches to Hive 3.1. If interested, please check it out:
https://www.datamonad.com/

Sungwoo


Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread David
I spent some time over the past couple of years making micro optimizations
within Avro, Parquet, ORC.

Curious to know if there's a way for you all to get timings at different
levels of the stack to compare and not just look at the top-line numbers. A
further breakdown could also help identify areas for improvement.

Thanks.

On Sat, Jan 7, 2023, 8:23 PM ypeng  wrote:

> [image: image.png]
>
> from your posting, the result is amazing. glad to know hive on mr3 has
> that nice performance.
>
> regards.
>
>
> On Sat, Jan 7, 2023 at 11:29 PM Sungwoo Park  wrote:
>
>> In fact, Hive 3 has been much faster than Spark for a long time. For
>> complex queries, Hive 3 is much faster than Presto (or Trino) as well. The
>> reality is different from common beliefs on Hive, Spark, and Presto. If
>> interested, see the result of performance comparison using the TPC-DS
>> benchmark.
>>
>> Performance comparison in October 2018:
>> https://www.datamonad.com/post/2018-10-30-performance-evaluation-0.4/
>>
>> Performance comparison in April 2022:
>> https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/
>>
>> Sungwoo
>>
>>
>> On Fri, Jan 6, 2023 at 12:35 PM ypeng  wrote:
>>
>>> Hello,
>>>
>>> Just from my personal testing, Hive 3.1.3 has much better performance
>>> than the old ones.
>>> It's even as fast as Spark by using the default mr engine.
>>> My test process and dataset,
>>>
>>> https://blog.crypt.pw/Another-10-million-dataset-testing-for-Spark-and-Hive
>>>
>>> Thanks.
>>>
>>


Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread ypeng
[image: image.png]

from your posting, the result is amazing. glad to know hive on mr3 has that
nice performance.

regards.


On Sat, Jan 7, 2023 at 11:29 PM Sungwoo Park  wrote:

> In fact, Hive 3 has been much faster than Spark for a long time. For
> complex queries, Hive 3 is much faster than Presto (or Trino) as well. The
> reality is different from common beliefs on Hive, Spark, and Presto. If
> interested, see the result of performance comparison using the TPC-DS
> benchmark.
>
> Performance comparison in October 2018:
> https://www.datamonad.com/post/2018-10-30-performance-evaluation-0.4/
>
> Performance comparison in April 2022:
> https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/
>
> Sungwoo
>
>
> On Fri, Jan 6, 2023 at 12:35 PM ypeng  wrote:
>
>> Hello,
>>
>> Just from my personal testing, Hive 3.1.3 has much better performance
>> than the old ones.
>> It's even as fast as Spark by using the default mr engine.
>> My test process and dataset,
>>
>> https://blog.crypt.pw/Another-10-million-dataset-testing-for-Spark-and-Hive
>>
>> Thanks.
>>
>


Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread Mich Talebzadeh
Thanks for this insight guys.

On your point below and I quote:

...  "It's even as fast as Spark by using the default mr engine"

OK as we are all experimentalists, are we stating that the classic
MapReduce computation can outdo Spark's in-memory computation. I would be
curious to know this.

Thanks



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 6 Jan 2023 at 03:35, ypeng  wrote:

> Hello,
>
> Just from my personal testing, Hive 3.1.3 has much better performance than
> the old ones.
> It's even as fast as Spark by using the default mr engine.
> My test process and dataset,
> https://blog.crypt.pw/Another-10-million-dataset-testing-for-Spark-and-Hive
>
> Thanks.
>


Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread Sungwoo Park
In fact, Hive 3 has been much faster than Spark for a long time. For
complex queries, Hive 3 is much faster than Presto (or Trino) as well. The
reality is different from common beliefs on Hive, Spark, and Presto. If
interested, see the result of performance comparison using the TPC-DS
benchmark.

Performance comparison in October 2018:
https://www.datamonad.com/post/2018-10-30-performance-evaluation-0.4/

Performance comparison in April 2022:
https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/

Sungwoo


On Fri, Jan 6, 2023 at 12:35 PM ypeng  wrote:

> Hello,
>
> Just from my personal testing, Hive 3.1.3 has much better performance than
> the old ones.
> It's even as fast as Spark by using the default mr engine.
> My test process and dataset,
> https://blog.crypt.pw/Another-10-million-dataset-testing-for-Spark-and-Hive
>
> Thanks.
>