Re: Hive on Tez vs Impala
> I wish the Hive team to keep things more backward-compatible as well. Hive is > such an enormous system with a wide-spread impact so any > backward-incompatible change could cause an uproar in the community. The incompatibilities were not avoidable in a set of situations - a lot of those were in Hive2, but hidden away or deliberately disabled to make Hive 3 into what it is. Here's a quick run-down of how the incompatibilities at the table level allow a final user to run more SQL https://www.slideshare.net/dbist/hive-3-a-new-horizon/10 The incompatibilities form the foundation for something like "How do I have Kafka streams offloaded to S3 cold data stores, but still query down to the last second without the small file problem?". Cheers, Gopal
Re: Hive on Tez vs Impala
I'm using Hive 3.1 on Tez/LLAP and I must say the experience was not good but it was worth it. We built Hive from HDP's hive-release and add Tez UI back, combined that with Hue 4.3 (also built from Cloudera Hue). Now that the two companies have merged I think things are going to get better (I'm not an enterprise user of either CDH or HDP and we build our own distro based off their open-source version). Hue is now trying to integrate with Atlas and Ranger which is a really good step. We like Tez because it has been stable enough for batch processing jobs. The LLAP and vectorized side of things is a different story and that's where the new Hive is going to be. However, historically it hasn't been that stable as much as pure Tez containers in our opinion. LLAP + vectorized execution can bring the speed to sub-seconds if you have the hardware for it (at least 128G of mem instance with a good 10Gbit network, i3.4xlarge on AWS for example). It's actually faster than Presto (in our case AWS Athena as well) in a few cases however I would say they are very comparable. I like the fact that we can use a single SQL dialect (for both batch and interactive queries) using a combination of Hive 3.x on Tez and Hive 3.1 on LLAP. There's no context switching between different dialect wasting our time in LATERAL VIEW explode(..) vs. CROSS JOIN unnest(...). One thing I must say though, Hive 3 has a few backwards-incompatible changes so be careful. For example, the transition of the managed table to a default transactional table has broken many of our assumptions. I wish the Hive team to keep things more backward-compatible as well. Hive is such an enormous system with a wide-spread impact so any backward-incompatible change could cause an uproar in the community. On Tue, Apr 16, 2019 at 8:08 AM Edward Capriolo wrote: > I have changes jobs 3 times since tez was introduced. It is a true waste > of compute resources and time that it was never patched in. So I either > have to waste my time patching it in, waste my time running a side > deployment, or not installing it and waste money having queries run longer > on mr/spark engine. > > Imagine how much compute hours have been lost world wide. > On Tuesday, April 16, 2019, Manoj Murumkar > wrote: > >> If we install our own build of Hive, we'll be out of support from CDH. >> >> Tez is not supported anyway and we're not touching any CDH bits, so it's >> not a big issue to have our own build of Tez engine. >> >> > On Apr 15, 2019, at 9:20 PM, Gopal Vijayaraghavan >> wrote: >> > >> > >> > Hi, >> > >> >>> However, we have built Tez on CDH and it runs just fine. >> > >> > Down that path you'll also need to deploy a slightly newer version of >> Hive as well, because Hive 1.1 is a bit ancient & has known bugs with the >> tez planner code. >> > >> > You effectively end up building the hortonworks/hive-release builds, by >> undoing the non-htrace tracing impl & applying the htrace one back etc. >> > >> >> Lol. I was hoping that the merger would unblock the "saltyness". >> > >> > Historically, I've unofficially supported folks using Tez on CDH in >> prod (assuming they buy me enough coffee), though I might have discontinue >> that. >> > >> > >> https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/vendor-repos.xml#L11 >> > >> > Cheers, >> > Gopal >> > >> > >> > > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual. > -- Thai
Re: Hive on Tez vs Impala
I have changes jobs 3 times since tez was introduced. It is a true waste of compute resources and time that it was never patched in. So I either have to waste my time patching it in, waste my time running a side deployment, or not installing it and waste money having queries run longer on mr/spark engine. Imagine how much compute hours have been lost world wide. On Tuesday, April 16, 2019, Manoj Murumkar wrote: > If we install our own build of Hive, we'll be out of support from CDH. > > Tez is not supported anyway and we're not touching any CDH bits, so it's > not a big issue to have our own build of Tez engine. > > > On Apr 15, 2019, at 9:20 PM, Gopal Vijayaraghavan > wrote: > > > > > > Hi, > > > >>> However, we have built Tez on CDH and it runs just fine. > > > > Down that path you'll also need to deploy a slightly newer version of > Hive as well, because Hive 1.1 is a bit ancient & has known bugs with the > tez planner code. > > > > You effectively end up building the hortonworks/hive-release builds, by > undoing the non-htrace tracing impl & applying the htrace one back etc. > > > >> Lol. I was hoping that the merger would unblock the "saltyness". > > > > Historically, I've unofficially supported folks using Tez on CDH in prod > (assuming they buy me enough coffee), though I might have discontinue that. > > > > https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/ > vendor-repos.xml#L11 > > > > Cheers, > > Gopal > > > > > -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive on Tez vs Impala
If we install our own build of Hive, we'll be out of support from CDH. Tez is not supported anyway and we're not touching any CDH bits, so it's not a big issue to have our own build of Tez engine. > On Apr 15, 2019, at 9:20 PM, Gopal Vijayaraghavan wrote: > > > Hi, > >>> However, we have built Tez on CDH and it runs just fine. > > Down that path you'll also need to deploy a slightly newer version of Hive as > well, because Hive 1.1 is a bit ancient & has known bugs with the tez planner > code. > > You effectively end up building the hortonworks/hive-release builds, by > undoing the non-htrace tracing impl & applying the htrace one back etc. > >> Lol. I was hoping that the merger would unblock the "saltyness". > > Historically, I've unofficially supported folks using Tez on CDH in prod > (assuming they buy me enough coffee), though I might have discontinue that. > > https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/vendor-repos.xml#L11 > > Cheers, > Gopal > >
Re: Hive on Tez vs Impala
Hi, >> However, we have built Tez on CDH and it runs just fine. Down that path you'll also need to deploy a slightly newer version of Hive as well, because Hive 1.1 is a bit ancient & has known bugs with the tez planner code. You effectively end up building the hortonworks/hive-release builds, by undoing the non-htrace tracing impl & applying the htrace one back etc. > Lol. I was hoping that the merger would unblock the "saltyness". Historically, I've unofficially supported folks using Tez on CDH in prod (assuming they buy me enough coffee), though I might have discontinue that. https://github.com/t3rmin4t0r/tez-autobuild/blob/llap/vendor-repos.xml#L11 Cheers, Gopal
Re: Hive on Tez vs Impala
Lol. I was hoping that the merger would unblock the "saltyness". I wonder what is the official position is now because back in the day there was a puff piece produced to the effect of hive was not the way forward and impala is the bees knees. On Monday, April 15, 2019, Manoj Murumkar wrote: > No, not yet. However, we have built Tez on CDH and it runs just fine. > Following blog summarizes part of the work (bit old, we currently run Tez > 0.9.1 on CDH 5.16.1). > > https://blog.upala.com/2017/03/04/setting-up-tez-on-cdh-cluster/ > > Blog says use ATS from open source hadoop, which will not work if you've > kerberized the cluster. You'll have to build a version of ATS against CDH > libraries that provides the classes needed to run the engine. We have done > this work as well and it runs pretty smoothly. > > > > On Mon, Apr 15, 2019 at 8:33 AM Edward Capriolo > wrote: > >> Out of band question. Given: >> https://hortonworks.com/blog/welcome-brand-new-cloudera/ >> >> Does cdh finally ship with a tea you dont have to manually patch in? >> On Monday, April 15, 2019, Sungwoo Park wrote: >> >>> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH >>> 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a >>> new execution engine for Hadoop and Kubernetes). You can find the result at: >>> >>> https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/ >>> >>> On average, Hive on MR3 is about 30% faster than Hive on Tez on >>> sequential queries. For concurrent queries, the throughput of Hive on MR3 >>> is about three times higher than Hive on Tez (when tested with 16 >>> concurrent queries). You can find the result at: >>> >>> https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/ >>> >>> --- Sungwoo Park >>> >>> On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko >>> wrote: >>> Hi, We are using CDH 5, with Impala 2.7.0-cdh5.9.1 and Hive 1.1 (MapReduce) I can't find the info regarding Hive on Tez performance compared to Impala. Does someone know or compared it? Thanks Artur Sukhenko >>> >> >> -- >> Sorry this was sent from mobile. Will do less grammar and spell check >> than usual. >> > -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive on Tez vs Impala
No, not yet. However, we have built Tez on CDH and it runs just fine. Following blog summarizes part of the work (bit old, we currently run Tez 0.9.1 on CDH 5.16.1). https://blog.upala.com/2017/03/04/setting-up-tez-on-cdh-cluster/ Blog says use ATS from open source hadoop, which will not work if you've kerberized the cluster. You'll have to build a version of ATS against CDH libraries that provides the classes needed to run the engine. We have done this work as well and it runs pretty smoothly. On Mon, Apr 15, 2019 at 8:33 AM Edward Capriolo wrote: > Out of band question. Given: > https://hortonworks.com/blog/welcome-brand-new-cloudera/ > > Does cdh finally ship with a tea you dont have to manually patch in? > On Monday, April 15, 2019, Sungwoo Park wrote: > >> I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH >> 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a >> new execution engine for Hadoop and Kubernetes). You can find the result at: >> >> https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/ >> >> On average, Hive on MR3 is about 30% faster than Hive on Tez on >> sequential queries. For concurrent queries, the throughput of Hive on MR3 >> is about three times higher than Hive on Tez (when tested with 16 >> concurrent queries). You can find the result at: >> >> https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/ >> >> --- Sungwoo Park >> >> On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko >> wrote: >> >>> Hi, >>> We are using CDH 5, with Impala 2.7.0-cdh5.9.1 and Hive 1.1 >>> (MapReduce) >>> I can't find the info regarding Hive on Tez performance compared to >>> Impala. >>> Does someone know or compared it? >>> >>> Thanks >>> >>> Artur Sukhenko >>> >> > > -- > Sorry this was sent from mobile. Will do less grammar and spell check than > usual. >
Re: Hive on Tez vs Impala
Thanks Sungwoo, very nice articles. On Mon, Apr 15, 2019 at 5:38 PM Sungwoo Park wrote: > I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH > 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a > new execution engine for Hadoop and Kubernetes). You can find the result at: > > https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/ > > On average, Hive on MR3 is about 30% faster than Hive on Tez on sequential > queries. For concurrent queries, the throughput of Hive on MR3 is about > three times higher than Hive on Tez (when tested with 16 concurrent > queries). You can find the result at: > > https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/ > > --- Sungwoo Park > > On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko > wrote: > >> Hi, >> We are using CDH 5, with Impala 2.7.0-cdh5.9.1 and Hive 1.1 (MapReduce) >> I can't find the info regarding Hive on Tez performance compared to >> Impala. >> Does someone know or compared it? >> >> Thanks >> >> Artur Sukhenko >> >
Re: Hive on Tez vs Impala
Out of band question. Given: https://hortonworks.com/blog/welcome-brand-new-cloudera/ Does cdh finally ship with a tea you dont have to manually patch in? On Monday, April 15, 2019, Sungwoo Park wrote: > I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH > 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a > new execution engine for Hadoop and Kubernetes). You can find the result at: > > https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/ > > On average, Hive on MR3 is about 30% faster than Hive on Tez on sequential > queries. For concurrent queries, the throughput of Hive on MR3 is about > three times higher than Hive on Tez (when tested with 16 concurrent > queries). You can find the result at: > > https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/ > > --- Sungwoo Park > > On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko > wrote: > >> Hi, >> We are using CDH 5, with Impala 2.7.0-cdh5.9.1 and Hive 1.1 (MapReduce) >> I can't find the info regarding Hive on Tez performance compared to >> Impala. >> Does someone know or compared it? >> >> Thanks >> >> Artur Sukhenko >> > -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: Hive on Tez vs Impala
I tested the performance of Impala 2.12.0+cdh5.15.2+0 in Cloudera CDH 5.15.2 a while ago. I compared it with Hive 3.1.1 on MR3 (where MR3 is a new execution engine for Hadoop and Kubernetes). You can find the result at: https://mr3.postech.ac.kr/blog/2019/03/22/performance-evaluation-0.6/ On average, Hive on MR3 is about 30% faster than Hive on Tez on sequential queries. For concurrent queries, the throughput of Hive on MR3 is about three times higher than Hive on Tez (when tested with 16 concurrent queries). You can find the result at: https://mr3.postech.ac.kr/blog/2018/10/30/performance-evaluation-0.4/ --- Sungwoo Park On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko wrote: > Hi, > We are using CDH 5, with Impala 2.7.0-cdh5.9.1 and Hive 1.1 (MapReduce) > I can't find the info regarding Hive on Tez performance compared to Impala. > Does someone know or compared it? > > Thanks > > Artur Sukhenko >