Hive on Tez vs Impala

2019-04-15 Thread Artur Sukhenko
Hi,
We are using CDH 5, with Impala  2.7.0-cdh5.9.1  and Hive 1.1 (MapReduce)
I can't find the info regarding Hive on Tez performance compared to Impala.
Does someone know or compared it?

Thanks

Artur Sukhenko


Re: Hive on Tez vs Impala

2019-04-15 Thread Edward Capriolo
Out of band question. Given:
https://hortonworks.com/blog/welcome-brand-new-cloudera/

Does cdh finally ship with a tea you dont have to manually patch in?
On Monday, April 15, 2019, Sungwoo Park  wrote:

> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH
> 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a
> new execution engine for Hadoop and Kubernetes). You can find the result at:
>
> https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/
>
> On average, Hive on MR3 is about 30% faster than Hive on Tez on sequential
> queries. For concurrent queries, the throughput of Hive on MR3 is about
> three times higher than Hive on Tez (when tested with 16 concurrent
> queries). You can find the result at:
>
> https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/
>
> --- Sungwoo Park
>
> On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko 
> wrote:
>
>> Hi,
>> We are using CDH 5, with Impala  2.7.0-cdh5.9.1  and Hive 1.1 (MapReduce)
>> I can't find the info regarding Hive on Tez performance compared to
>> Impala.
>> Does someone know or compared it?
>>
>> Thanks
>>
>> Artur Sukhenko
>>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Hive on Tez vs Impala

2019-04-15 Thread Sungwoo Park
I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH
5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a
new execution engine for Hadoop and Kubernetes). You can find the result at:

https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/

On average, Hive on MR3 is about 30% faster than Hive on Tez on sequential
queries. For concurrent queries, the throughput of Hive on MR3 is about
three times higher than Hive on Tez (when tested with 16 concurrent
queries). You can find the result at:

https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/

--- Sungwoo Park

On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko 
wrote:

> Hi,
> We are using CDH 5, with Impala  2.7.0-cdh5.9.1  and Hive 1.1 (MapReduce)
> I can't find the info regarding Hive on Tez performance compared to Impala.
> Does someone know or compared it?
>
> Thanks
>
> Artur Sukhenko
>


Re: Hive on Tez vs Impala

2019-04-15 Thread Manoj Murumkar
No, not yet. However, we have built Tez on CDH and it runs just fine.
Following blog summarizes part of the work (bit old, we currently run Tez
0.9.1 on CDH 5.16.1).

https://blog.upala.com/2017/03/04/setting-up-tez-on-cdh-cluster/

Blog says use ATS from open source hadoop, which will not work if you've
kerberized the cluster. You'll have to build a version of ATS against CDH
libraries that provides the classes needed to run the engine. We have done
this work as well and it runs pretty smoothly.



On Mon, Apr 15, 2019 at 8:33 AM Edward Capriolo 
wrote:

> Out of band question. Given:
> https://hortonworks.com/blog/welcome-brand-new-cloudera/
>
> Does cdh finally ship with a tea you dont have to manually patch in?
> On Monday, April 15, 2019, Sungwoo Park  wrote:
>
>> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH
>> 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a
>> new execution engine for Hadoop and Kubernetes). You can find the result at:
>>
>> https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/
>>
>> On average, Hive on MR3 is about 30% faster than Hive on Tez on
>> sequential queries. For concurrent queries, the throughput of Hive on MR3
>> is about three times higher than Hive on Tez (when tested with 16
>> concurrent queries). You can find the result at:
>>
>> https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/
>>
>> --- Sungwoo Park
>>
>> On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko 
>> wrote:
>>
>>> Hi,
>>> We are using CDH 5, with Impala  2.7.0-cdh5.9.1  and Hive 1.1
>>> (MapReduce)
>>> I can't find the info regarding Hive on Tez performance compared to
>>> Impala.
>>> Does someone know or compared it?
>>>
>>> Thanks
>>>
>>> Artur Sukhenko
>>>
>>
>
> --
> Sorry this was sent from mobile. Will do less grammar and spell check than
> usual.
>


Re: Hive on Tez vs Impala

2019-04-15 Thread Artur Sukhenko
Thanks Sungwoo, very nice articles.

On Mon, Apr 15, 2019 at 5:38 PM Sungwoo Park  wrote:

> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH
> 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a
> new execution engine for Hadoop and Kubernetes). You can find the result at:
>
> https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/
>
> On average, Hive on MR3 is about 30% faster than Hive on Tez on sequential
> queries. For concurrent queries, the throughput of Hive on MR3 is about
> three times higher than Hive on Tez (when tested with 16 concurrent
> queries). You can find the result at:
>
> https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/
>
> --- Sungwoo Park
>
> On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko 
> wrote:
>
>> Hi,
>> We are using CDH 5, with Impala  2.7.0-cdh5.9.1  and Hive 1.1 (MapReduce)
>> I can't find the info regarding Hive on Tez performance compared to
>> Impala.
>> Does someone know or compared it?
>>
>> Thanks
>>
>> Artur Sukhenko
>>
>


Re: Hive on Tez vs Impala

2019-04-15 Thread Edward Capriolo
Lol. I was hoping that the merger would unblock the "saltyness". I wonder
what is the official position is now because back in the day there was a
puff piece produced to the effect of hive was not the way forward and
impala is the bees knees.

On Monday, April 15, 2019, Manoj Murumkar  wrote:

> No, not yet. However, we have built Tez on CDH and it runs just fine.
> Following blog summarizes part of the work (bit old, we currently run Tez
> 0.9.1 on CDH 5.16.1).
>
> https://blog.upala.com/2017/03/04/setting-up-tez-on-cdh-cluster/
>
> Blog says use ATS from open source hadoop, which will not work if you've
> kerberized the cluster. You'll have to build a version of ATS against CDH
> libraries that provides the classes needed to run the engine. We have done
> this work as well and it runs pretty smoothly.
>
>
>
> On Mon, Apr 15, 2019 at 8:33 AM Edward Capriolo 
> wrote:
>
>> Out of band question. Given:
>> https://hortonworks.com/blog/welcome-brand-new-cloudera/
>>
>> Does cdh finally ship with a tea you dont have to manually patch in?
>> On Monday, April 15, 2019, Sungwoo Park  wrote:
>>
>>> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH
>>> 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a
>>> new execution engine for Hadoop and Kubernetes). You can find the result at:
>>>
>>> https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/
>>>
>>> On average, Hive on MR3 is about 30% faster than Hive on Tez on
>>> sequential queries. For concurrent queries, the throughput of Hive on MR3
>>> is about three times higher than Hive on Tez (when tested with 16
>>> concurrent queries). You can find the result at:
>>>
>>> https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/
>>>
>>> --- Sungwoo Park
>>>
>>> On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko 
>>> wrote:
>>>
 Hi,
 We are using CDH 5, with Impala  2.7.0-cdh5.9.1  and Hive 1.1
 (MapReduce)
 I can't find the info regarding Hive on Tez performance compared to
 Impala.
 Does someone know or compared it?

 Thanks

 Artur Sukhenko

>>>
>>
>> --
>> Sorry this was sent from mobile. Will do less grammar and spell check
>> than usual.
>>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: Hive on Tez vs Impala

2019-04-15 Thread Gopal Vijayaraghavan


Hi,

>> However, we have built Tez on CDH and it runs just fine.

Down that path you'll also need to deploy a slightly newer version of Hive as 
well, because Hive 1.1 is a bit ancient & has known bugs with the tez planner 
code.

You effectively end up building the hortonworks/hive-release builds, by undoing 
the non-htrace tracing impl & applying the htrace one back etc.

> Lol. I was hoping that the merger would unblock the "saltyness". 

Historically, I've unofficially supported folks using Tez on CDH in prod 
(assuming they buy me enough coffee), though I might have discontinue that.

https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/vendor-repos.xml#L11

Cheers,
Gopal




Re: Hive on Tez vs Impala

2019-04-15 Thread Manoj Murumkar
If we install our own build of Hive, we'll be out of support from CDH. 

Tez is not supported anyway and we're not touching any CDH bits, so it's not a 
big issue to have our own build of Tez engine.

> On Apr 15, 2019, at 9:20 PM, Gopal Vijayaraghavan  wrote:
> 
> 
> Hi,
> 
>>> However, we have built Tez on CDH and it runs just fine.
> 
> Down that path you'll also need to deploy a slightly newer version of Hive as 
> well, because Hive 1.1 is a bit ancient & has known bugs with the tez planner 
> code.
> 
> You effectively end up building the hortonworks/hive-release builds, by 
> undoing the non-htrace tracing impl & applying the htrace one back etc.
> 
>> Lol. I was hoping that the merger would unblock the "saltyness". 
> 
> Historically, I've unofficially supported folks using Tez on CDH in prod 
> (assuming they buy me enough coffee), though I might have discontinue that.
> 
> https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/vendor-repos.xml#L11
> 
> Cheers,
> Gopal
> 
>