Re: Next Hive 4.0.1 minor release

2024-05-14 Thread Okumin
Hi Zhihua,

Thanks for driving the next release. We are actively testing 4.0.0 and
would like to give some suggestions.

# HIVE-27847: Prevent query Failures on Numeric <-> Timestamp
We hit the issue when we ran Hive 4 with the option. I believe it is
worth resolving for those who want to try Hive 4, keeping
compatibilities with a previous version.
https://issues.apache.org/jira/browse/HIVE-27847

# HIVE-28098: Fails to copy empty column statistics of materialized CTE
This follows up on HIVE-28080, but the current 4.0.0 includes only
HIVE-28080. The reasonable option to me is to revert HIVE-28080 or
cherry-pick HIVE-28098, all or nothing.
https://issues.apache.org/jira/browse/HIVE-28098

Thanks,
Okumin

On Sat, May 11, 2024 at 9:45 AM dengzhhu653  wrote:
>
> Hello Community,
>
>
> As you have noticed, we are going to propose the next 4.0.1 release on top of 
> 4.0.0, with some
>
> critical bug fixes and improvements [1]. As of now we are putting the label 
> "hive-4.0.1-must" on the tickets
>
> and we plan to make sure those get c-picked to branch-4.0 [2]. Please suggest 
> other important fixes that can be
>
> included in this release if any.
>
>
> We will get this minor release out as soon as possible once all the tickets 
> marked with "hive-4.0.1-must" get resolved and tested.
>
>
> [1] https://lists.apache.org/thread/rkw2toj5d74t8n5jvnkrfw77hyzn7qh3
>
> [2] 
> https://issues.apache.org/jira/browse/HIVE-28204?jql=labels%20%3D%20hive-4.0.1-must
>
>
> Thanks,
>
> Zhihua


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-19 Thread Simhadri G
Thanks again everyone :)

On Fri, Apr 19, 2024, 2:15 AM Rajesh Balamohan 
wrote:

> Congratulations Simhadri. :)
>
> ~Rajesh.B
>
> On Fri, Apr 19, 2024 at 2:02 AM Aman Sinha  wrote:
>
>> Congrats Simhadri !
>>
>> On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam
>>  wrote:
>>
>>> Congrats Simhadri. Looking forward to many more contributions in the
>>> future.
>>>
>>> On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
>>>  wrote:
>>>
>>>> Congratulations Simhadri  well deserved
>>>>
>>>> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>>>>
>>>>> Congratulations
>>>>>
>>>>> Missatge de Alessandro Solimando  del
>>>>> dia dj., 18 d’abr. 2024 a les 17:40:
>>>>>
>>>>>> Great news, Simhadri, very well deserved!
>>>>>>
>>>>>> On Thu, 18 Apr 2024 at 15:07, Simhadri G 
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks everyone!
>>>>>>> I really appreciate it, it means a lot to me :)
>>>>>>> The Apache Hive project and its community have truly inspired me .
>>>>>>> I'm grateful for the chance to contribute to such a remarkable project.
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Simhadri Govindappa
>>>>>>>
>>>>>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>>>>>  wrote:
>>>>>>>
>>>>>>>> Congrats Simhadri!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Sankar
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:* Butao Zhang 
>>>>>>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>>>>>>> *To:* user@hive.apache.org; dev 
>>>>>>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
>>>>>>>> Govindappa
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> You don't often get email from butaozha...@163.com. Learn why this
>>>>>>>> is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>>
>>>>>>>> Congratulations Simhadri !!!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>>>>>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
>>>>>>>> Saxena 
>>>>>>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>>>>>>> *收件人**:* dev ; user@hive.apache.org <
>>>>>>>> user@hive.apache.org>
>>>>>>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Apache Hive's Project Management Committee (PMC) has invited
>>>>>>>> Simhadri Govindappa to become a committer, and we are pleased to 
>>>>>>>> announce
>>>>>>>> that he has accepted.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Please join me in congratulating him, Congratulations Simhadri,
>>>>>>>> Welcome aboard!!!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -Ayush Saxena
>>>>>>>>
>>>>>>>> (On behalf of Apache Hive PMC)
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> --
>>>>> Pau Tallada Crespí
>>>>> Departament de Serveis
>>>>> Port d'Informació Científica (PIC)
>>>>> Tel: +34 93 170 2729
>>>>> --
>>>>>
>>>>>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Rajesh Balamohan
Congratulations Simhadri. :)

~Rajesh.B

On Fri, Apr 19, 2024 at 2:02 AM Aman Sinha  wrote:

> Congrats Simhadri !
>
> On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam
>  wrote:
>
>> Congrats Simhadri. Looking forward to many more contributions in the
>> future.
>>
>> On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
>>  wrote:
>>
>>> Congratulations Simhadri  well deserved
>>>
>>> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>>>
>>>> Congratulations
>>>>
>>>> Missatge de Alessandro Solimando  del
>>>> dia dj., 18 d’abr. 2024 a les 17:40:
>>>>
>>>>> Great news, Simhadri, very well deserved!
>>>>>
>>>>> On Thu, 18 Apr 2024 at 15:07, Simhadri G 
>>>>> wrote:
>>>>>
>>>>>> Thanks everyone!
>>>>>> I really appreciate it, it means a lot to me :)
>>>>>> The Apache Hive project and its community have truly inspired me .
>>>>>> I'm grateful for the chance to contribute to such a remarkable project.
>>>>>>
>>>>>> Thanks!
>>>>>> Simhadri Govindappa
>>>>>>
>>>>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>>>>  wrote:
>>>>>>
>>>>>>> Congrats Simhadri!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Sankar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Butao Zhang 
>>>>>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>>>>>> *To:* user@hive.apache.org; dev 
>>>>>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
>>>>>>> Govindappa
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You don't often get email from butaozha...@163.com. Learn why this
>>>>>>> is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>>
>>>>>>> Congratulations Simhadri !!!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>>>>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
>>>>>>> Saxena 
>>>>>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>>>>>> *收件人**:* dev ; user@hive.apache.org <
>>>>>>> user@hive.apache.org>
>>>>>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Apache Hive's Project Management Committee (PMC) has invited
>>>>>>> Simhadri Govindappa to become a committer, and we are pleased to 
>>>>>>> announce
>>>>>>> that he has accepted.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please join me in congratulating him, Congratulations Simhadri,
>>>>>>> Welcome aboard!!!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -Ayush Saxena
>>>>>>>
>>>>>>> (On behalf of Apache Hive PMC)
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> --
>>>> Pau Tallada Crespí
>>>> Departament de Serveis
>>>> Port d'Informació Científica (PIC)
>>>> Tel: +34 93 170 2729
>>>> --
>>>>
>>>>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Aman Sinha
Congrats Simhadri !

On Thu, Apr 18, 2024 at 12:25 PM Naveen Gangam 
wrote:

> Congrats Simhadri. Looking forward to many more contributions in the
> future.
>
> On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
>  wrote:
>
>> Congratulations Simhadri  well deserved
>>
>> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>>
>>> Congratulations
>>>
>>> Missatge de Alessandro Solimando  del
>>> dia dj., 18 d’abr. 2024 a les 17:40:
>>>
>>>> Great news, Simhadri, very well deserved!
>>>>
>>>> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>>>>
>>>>> Thanks everyone!
>>>>> I really appreciate it, it means a lot to me :)
>>>>> The Apache Hive project and its community have truly inspired me . I'm
>>>>> grateful for the chance to contribute to such a remarkable project.
>>>>>
>>>>> Thanks!
>>>>> Simhadri Govindappa
>>>>>
>>>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>>>  wrote:
>>>>>
>>>>>> Congrats Simhadri!
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Sankar
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Butao Zhang 
>>>>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>>>>> *To:* user@hive.apache.org; dev 
>>>>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
>>>>>> Govindappa
>>>>>>
>>>>>>
>>>>>>
>>>>>> You don't often get email from butaozha...@163.com. Learn why this
>>>>>> is important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>>
>>>>>> Congratulations Simhadri !!!
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>>>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
>>>>>> Saxena 
>>>>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>>>>> *收件人**:* dev ; user@hive.apache.org <
>>>>>> user@hive.apache.org>
>>>>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>>>>>> Govindappa to become a committer, and we are pleased to announce that he
>>>>>> has accepted.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please join me in congratulating him, Congratulations Simhadri,
>>>>>> Welcome aboard!!!
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Ayush Saxena
>>>>>>
>>>>>> (On behalf of Apache Hive PMC)
>>>>>>
>>>>>
>>>
>>> --
>>> --
>>> Pau Tallada Crespí
>>> Departament de Serveis
>>> Port d'Informació Científica (PIC)
>>> Tel: +34 93 170 2729
>>> --
>>>
>>>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Naveen Gangam
Congrats Simhadri. Looking forward to many more contributions in the future.

On Thu, Apr 18, 2024 at 12:25 PM Sai Hemanth Gantasala
 wrote:

> Congratulations Simhadri  well deserved
>
> On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:
>
>> Congratulations
>>
>> Missatge de Alessandro Solimando  del
>> dia dj., 18 d’abr. 2024 a les 17:40:
>>
>>> Great news, Simhadri, very well deserved!
>>>
>>> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>>>
>>>> Thanks everyone!
>>>> I really appreciate it, it means a lot to me :)
>>>> The Apache Hive project and its community have truly inspired me . I'm
>>>> grateful for the chance to contribute to such a remarkable project.
>>>>
>>>> Thanks!
>>>> Simhadri Govindappa
>>>>
>>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>>  wrote:
>>>>
>>>>> Congrats Simhadri!
>>>>>
>>>>>
>>>>>
>>>>> -Sankar
>>>>>
>>>>>
>>>>>
>>>>> *From:* Butao Zhang 
>>>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>>>> *To:* user@hive.apache.org; dev 
>>>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri
>>>>> Govindappa
>>>>>
>>>>>
>>>>>
>>>>> You don't often get email from butaozha...@163.com. Learn why this is
>>>>> important <https://aka.ms/LearnAboutSenderIdentification>
>>>>>
>>>>> Congratulations Simhadri !!!
>>>>>
>>>>>
>>>>>
>>>>> Thanks.
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush
>>>>> Saxena 
>>>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>>>> *收件人**:* dev ; user@hive.apache.org <
>>>>> user@hive.apache.org>
>>>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>>>
>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>>>>> Govindappa to become a committer, and we are pleased to announce that he
>>>>> has accepted.
>>>>>
>>>>>
>>>>>
>>>>> Please join me in congratulating him, Congratulations Simhadri,
>>>>> Welcome aboard!!!
>>>>>
>>>>>
>>>>>
>>>>> -Ayush Saxena
>>>>>
>>>>> (On behalf of Apache Hive PMC)
>>>>>
>>>>
>>
>> --
>> --
>> Pau Tallada Crespí
>> Departament de Serveis
>> Port d'Informació Científica (PIC)
>> Tel: +34 93 170 2729
>> --
>>
>>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Sai Hemanth Gantasala
Congratulations Simhadri  well deserved

On Thu, Apr 18, 2024 at 8:41 AM Pau Tallada  wrote:

> Congratulations
>
> Missatge de Alessandro Solimando  del dia
> dj., 18 d’abr. 2024 a les 17:40:
>
>> Great news, Simhadri, very well deserved!
>>
>> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>>
>>> Thanks everyone!
>>> I really appreciate it, it means a lot to me :)
>>> The Apache Hive project and its community have truly inspired me . I'm
>>> grateful for the chance to contribute to such a remarkable project.
>>>
>>> Thanks!
>>> Simhadri Govindappa
>>>
>>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>>  wrote:
>>>
>>>> Congrats Simhadri!
>>>>
>>>>
>>>>
>>>> -Sankar
>>>>
>>>>
>>>>
>>>> *From:* Butao Zhang 
>>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>>> *To:* user@hive.apache.org; dev 
>>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>>>>
>>>>
>>>>
>>>> You don't often get email from butaozha...@163.com. Learn why this is
>>>> important <https://aka.ms/LearnAboutSenderIdentification>
>>>>
>>>> Congratulations Simhadri !!!
>>>>
>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> --
>>>>
>>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena
>>>> 
>>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>>> *收件人**:* dev ; user@hive.apache.org <
>>>> user@hive.apache.org>
>>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>>
>>>>
>>>>
>>>> Hi All,
>>>>
>>>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>>>> Govindappa to become a committer, and we are pleased to announce that he
>>>> has accepted.
>>>>
>>>>
>>>>
>>>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>>>> aboard!!!
>>>>
>>>>
>>>>
>>>> -Ayush Saxena
>>>>
>>>> (On behalf of Apache Hive PMC)
>>>>
>>>
>
> --
> --
> Pau Tallada Crespí
> Departament de Serveis
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> --
>
>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Pau Tallada
Congratulations

Missatge de Alessandro Solimando  del dia
dj., 18 d’abr. 2024 a les 17:40:

> Great news, Simhadri, very well deserved!
>
> On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:
>
>> Thanks everyone!
>> I really appreciate it, it means a lot to me :)
>> The Apache Hive project and its community have truly inspired me . I'm
>> grateful for the chance to contribute to such a remarkable project.
>>
>> Thanks!
>> Simhadri Govindappa
>>
>> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>>  wrote:
>>
>>> Congrats Simhadri!
>>>
>>>
>>>
>>> -Sankar
>>>
>>>
>>>
>>> *From:* Butao Zhang 
>>> *Sent:* Thursday, April 18, 2024 5:39 PM
>>> *To:* user@hive.apache.org; dev 
>>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>>>
>>>
>>>
>>> You don't often get email from butaozha...@163.com. Learn why this is
>>> important <https://aka.ms/LearnAboutSenderIdentification>
>>>
>>> Congratulations Simhadri !!!
>>>
>>>
>>>
>>> Thanks.
>>>
>>>
>>> --
>>>
>>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
>>> ayush...@gmail.com>
>>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>>> *收件人**:* dev ; user@hive.apache.org <
>>> user@hive.apache.org>
>>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>>
>>>
>>>
>>> Hi All,
>>>
>>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>>> Govindappa to become a committer, and we are pleased to announce that he
>>> has accepted.
>>>
>>>
>>>
>>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>>> aboard!!!
>>>
>>>
>>>
>>> -Ayush Saxena
>>>
>>> (On behalf of Apache Hive PMC)
>>>
>>

-- 
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Alessandro Solimando
Great news, Simhadri, very well deserved!

On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:

> Thanks everyone!
> I really appreciate it, it means a lot to me :)
> The Apache Hive project and its community have truly inspired me . I'm
> grateful for the chance to contribute to such a remarkable project.
>
> Thanks!
> Simhadri Govindappa
>
> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>  wrote:
>
>> Congrats Simhadri!
>>
>>
>>
>> -Sankar
>>
>>
>>
>> *From:* Butao Zhang 
>> *Sent:* Thursday, April 18, 2024 5:39 PM
>> *To:* user@hive.apache.org; dev 
>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> You don't often get email from butaozha...@163.com. Learn why this is
>> important <https://aka.ms/LearnAboutSenderIdentification>
>>
>> Congratulations Simhadri !!!
>>
>>
>>
>> Thanks.
>>
>>
>> --
>>
>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
>> ayush...@gmail.com>
>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>> *收件人**:* dev ; user@hive.apache.org <
>> user@hive.apache.org>
>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> Hi All,
>>
>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>> Govindappa to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>>
>>
>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>> aboard!!!
>>
>>
>>
>> -Ayush Saxena
>>
>> (On behalf of Apache Hive PMC)
>>
>


Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Krisztian Kasa
Congratulations Simhadri!

Regards,
Krisztian

On Thu, Apr 18, 2024 at 3:25 PM kokila narayanan <
kokilanarayana...@gmail.com> wrote:

> Congratulations Simhadri 
>
> On Thu, 18 Apr, 2024, 17:22 Ayush Saxena,  wrote:
>
>> Hi All,
>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>> Govindappa to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>> aboard!!!
>>
>> -Ayush Saxena
>> (On behalf of Apache Hive PMC)
>>
>


Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Simhadri G
Thanks everyone!
I really appreciate it, it means a lot to me :)
The Apache Hive project and its community have truly inspired me . I'm
grateful for the chance to contribute to such a remarkable project.

Thanks!
Simhadri Govindappa

On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
 wrote:

> Congrats Simhadri!
>
>
>
> -Sankar
>
>
>
> *From:* Butao Zhang 
> *Sent:* Thursday, April 18, 2024 5:39 PM
> *To:* user@hive.apache.org; dev 
> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>
>
>
> You don't often get email from butaozha...@163.com. Learn why this is
> important <https://aka.ms/LearnAboutSenderIdentification>
>
> Congratulations Simhadri !!!
>
>
>
> Thanks.
>
>
> --
>
> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
> ayush...@gmail.com>
> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
> *收件人**:* dev ; user@hive.apache.org <
> user@hive.apache.org>
> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>
>
>
> Hi All,
>
> Apache Hive's Project Management Committee (PMC) has invited Simhadri
> Govindappa to become a committer, and we are pleased to announce that he
> has accepted.
>
>
>
> Please join me in congratulating him, Congratulations Simhadri, Welcome
> aboard!!!
>
>
>
> -Ayush Saxena
>
> (On behalf of Apache Hive PMC)
>


RE: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Sankar Hariappan via user
Congrats Simhadri!

-Sankar

From: Butao Zhang 
Sent: Thursday, April 18, 2024 5:39 PM
To: user@hive.apache.org; dev 
Subject: [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa

You don't often get email from butaozha...@163.com<mailto:butaozha...@163.com>. 
Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
Congratulations Simhadri !!!

Thanks.


发件人: 
user-return-28075-butaozhang1=163@hive.apache.org<mailto:user-return-28075-butaozhang1=163@hive.apache.org>
 
mailto:user-return-28075-butaozhang1=163@hive.apache.org>>
 代表 Ayush Saxena mailto:ayush...@gmail.com>>
发送时间: 星期四, 四月 18, 2024 7:50 下午
收件人: dev mailto:d...@hive.apache.org>>; 
user@hive.apache.org<mailto:user@hive.apache.org> 
mailto:user@hive.apache.org>>
主题: [ANNOUNCE] New Committer: Simhadri Govindappa

Hi All,
Apache Hive's Project Management Committee (PMC) has invited Simhadri 
Govindappa to become a committer, and we are pleased to announce that he has 
accepted.

Please join me in congratulating him, Congratulations Simhadri, Welcome 
aboard!!!

-Ayush Saxena
(On behalf of Apache Hive PMC)


Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Butao Zhang

  
  
  

Congratulations Simhadri !!!
Thanks.
  

 发件人: user-return-28075-butaozhang1=163@hive.apache.org  代表 Ayush Saxena 发送时间: 星期四, 四月 18, 2024 7:50 下午收件人: dev ; user@hive.apache.org 主题: [ANNOUNCE] New Committer: Simhadri Govindappa Hi All,Apache Hive's Project Management Committee (PMC) has invited Simhadri Govindappa to become a committer, and we are pleased to announce that he has accepted.Please join me in congratulating him, Congratulations Simhadri, Welcome aboard!!!-Ayush Saxena(On behalf of Apache Hive PMC)




Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-04 Thread Sungwoo Park
Congratulations and huge thanks to Apache Hive team and contributors for
releasing Hive 4. We have been watching the development of Hive 4 since the
release of Hive 3.1, and it's truly satisfying to witness the resolution of
all the critical issues at last after 5 years. Hive 4 comes with a lot of
new great features, and our initial performance benchmarking indicates that
it comes with a significant improvement over Hive 3 in terms of speed.

--- Sungwoo

On Wed, Apr 3, 2024 at 10:30 PM Okumin  wrote:

> I'm really excited to see the news! I can easily imagine the
> difficulty of testing and shipping Hive 4.0.0 with more than 5k
> commits. I'm proud to have witnessed this moment here.
>
> Thank you!
>
> On Wed, Apr 3, 2024 at 3:07 AM Naveen Gangam  wrote:
> >
> > Thank you for the tremendous amount of work put in by many many folks to
> make this release happen, including projects hive is dependent upon like
> tez.
> >
> > Thank you to all the PMC members, committers and contributors for all
> the work over the past 5+ years in shaping this release.
> >
> > THANK YOU!!!
> >
> > On Sun, Mar 31, 2024 at 8:54 AM Battula, Brahma Reddy 
> wrote:
> >>
> >> Thank you for your hard work and dedication in releasing Apache Hive
> version 4.0.0.
> >>
> >>
> >>
> >> Congratulations to the entire team on this achievement. Keep up the
> great work!
> >>
> >>
> >>
> >> Does this consider as GA.?
> >>
> >>
> >>
> >> And Looks we need to update in the following location also.?
> >>
> >> https://hive.apache.org/general/downloads/
> >>
> >>
> >>
> >>
> >>
> >> From: Denys Kuzmenko 
> >> Date: Saturday, March 30, 2024 at 00:07
> >> To: user@hive.apache.org , d...@hive.apache.org <
> d...@hive.apache.org>
> >> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
> >>
> >> The Apache Hive team is proud to announce the release of Apache Hive
> >>
> >> version 4.0.0.
> >>
> >>
> >>
> >> The Apache Hive (TM) data warehouse software facilitates querying and
> >>
> >> managing large datasets residing in distributed storage. Built on top
> >>
> >> of Apache Hadoop (TM), it provides, among others:
> >>
> >>
> >>
> >> * Tools to enable easy data extract/transform/load (ETL)
> >>
> >>
> >>
> >> * A mechanism to impose structure on a variety of data formats
> >>
> >>
> >>
> >> * Access to files stored either directly in Apache HDFS (TM) or in other
> >>
> >>   data storage systems such as Apache HBase (TM)
> >>
> >>
> >>
> >> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
> Spark frameworks. (MapReduce is deprecated, and Spark has been removed so
> the text needs to be modified depending on the release version)
> >>
> >>
> >>
> >> For Hive release details and downloads, please visit:
> >>
> >> https://hive.apache.org/downloads.html
> >>
> >>
> >>
> >> Hive 4.0.0 Release Notes are available here:
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
> >>
> >>
> >>
> >> We would like to thank the many contributors who made this release
> >>
> >> possible.
> >>
> >>
> >>
> >> Regards,
> >>
> >>
> >>
> >> The Apache Hive Team
>


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-03 Thread Okumin
I'm really excited to see the news! I can easily imagine the
difficulty of testing and shipping Hive 4.0.0 with more than 5k
commits. I'm proud to have witnessed this moment here.

Thank you!

On Wed, Apr 3, 2024 at 3:07 AM Naveen Gangam  wrote:
>
> Thank you for the tremendous amount of work put in by many many folks to make 
> this release happen, including projects hive is dependent upon like tez.
>
> Thank you to all the PMC members, committers and contributors for all the 
> work over the past 5+ years in shaping this release.
>
> THANK YOU!!!
>
> On Sun, Mar 31, 2024 at 8:54 AM Battula, Brahma Reddy  
> wrote:
>>
>> Thank you for your hard work and dedication in releasing Apache Hive version 
>> 4.0.0.
>>
>>
>>
>> Congratulations to the entire team on this achievement. Keep up the great 
>> work!
>>
>>
>>
>> Does this consider as GA.?
>>
>>
>>
>> And Looks we need to update in the following location also.?
>>
>> https://hive.apache.org/general/downloads/
>>
>>
>>
>>
>>
>> From: Denys Kuzmenko 
>> Date: Saturday, March 30, 2024 at 00:07
>> To: user@hive.apache.org , d...@hive.apache.org 
>> 
>> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>>
>> The Apache Hive team is proud to announce the release of Apache Hive
>>
>> version 4.0.0.
>>
>>
>>
>> The Apache Hive (TM) data warehouse software facilitates querying and
>>
>> managing large datasets residing in distributed storage. Built on top
>>
>> of Apache Hadoop (TM), it provides, among others:
>>
>>
>>
>> * Tools to enable easy data extract/transform/load (ETL)
>>
>>
>>
>> * A mechanism to impose structure on a variety of data formats
>>
>>
>>
>> * Access to files stored either directly in Apache HDFS (TM) or in other
>>
>>   data storage systems such as Apache HBase (TM)
>>
>>
>>
>> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
>> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
>> needs to be modified depending on the release version)
>>
>>
>>
>> For Hive release details and downloads, please visit:
>>
>> https://hive.apache.org/downloads.html
>>
>>
>>
>> Hive 4.0.0 Release Notes are available here:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>>
>>
>>
>> We would like to thank the many contributors who made this release
>>
>> possible.
>>
>>
>>
>> Regards,
>>
>>
>>
>> The Apache Hive Team


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Naveen Gangam
Thank you for the tremendous amount of work put in by many many folks to
make this release happen, including projects hive is dependent upon like
tez.

Thank you to all the PMC members, committers and contributors for all the
work over the past 5+ years in shaping this release.

THANK YOU!!!

On Sun, Mar 31, 2024 at 8:54 AM Battula, Brahma Reddy 
wrote:

> Thank you for your hard work and dedication in releasing Apache Hive
> version 4.0.0.
>
>
>
> Congratulations to the entire team on this achievement. Keep up the great
> work!
>
>
>
> Does this consider as GA.?
>
>
>
> And Looks we need to update in the following location also.?
>
> https://hive.apache.org/general/downloads/
>
>
>
>
>
> *From: *Denys Kuzmenko 
> *Date: *Saturday, March 30, 2024 at 00:07
> *To: *user@hive.apache.org , d...@hive.apache.org <
> d...@hive.apache.org>
> *Subject: *[ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
> needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team
>
>


RE: [EXTERNAL] Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Sankar Hariappan via user
Absolutely exciting news! Congrats to the entire Hive community for making this 
release happen!

-Sankar

From: Pau Tallada 
Sent: Tuesday, April 2, 2024 2:31 PM
To: user@hive.apache.org
Cc: d...@hive.apache.org
Subject: [EXTERNAL] Re: [ANNOUNCE] Apache Hive 4.0.0 Released

You don't often get email from tall...@pic.es<mailto:tall...@pic.es>. Learn why 
this is important<https://aka.ms/LearnAboutSenderIdentification>
Congrats to all for the hard work

Missatge de Butao Zhang mailto:butaozha...@163.com>> del 
dia dt., 2 d’abr. 2024 a les 10:58:
I'm thrilled to see the official release of Apache Hive 4.0.0, marking another 
milestone in the development of the Hive community. I want to extend my 
gratitude to all the partners in the community for their hard work.
Also special thanks to Denys for your diligent code reviews and efforts in 
completing the version release process, which I deeply admire.

Wishing the Apache Hive community continued growth and success. Keep up the 
great work!


Thanks,
Butao Zhang


 Replied Message 
From
Stamatis Zampetakis<mailto:undefined>
Date
4/2/2024 16:39
To
<mailto:d...@hive.apache.org>
Cc
user@hive.apache.org<mailto:user@hive.apache.org>
Subject
Re: [ANNOUNCE] Apache Hive 4.0.0 Released
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
mailto:bbatt...@visa.com.invalid>> wrote:

Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko mailto:dkuzme...@apache.org>>
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org<mailto:user@hive.apache.org> 
mailto:user@hive.apache.org>>, 
d...@hive.apache.org<mailto:d...@hive.apache.org> 
mailto:d...@hive.apache.org>>
Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


--
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--



Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Pau Tallada
Congrats to all for the hard work

Missatge de Butao Zhang  del dia dt., 2 d’abr. 2024 a
les 10:58:

> I'm thrilled to see the official release of Apache Hive 4.0.0, marking
> another milestone in the development of the Hive community. I want to
> extend my gratitude to all the partners in the community for their hard
> work.
> Also special thanks to Denys for your diligent code reviews and efforts in
> completing the version release process, which I deeply admire.
>
> Wishing the Apache Hive community continued growth and success. Keep up
> the great work!
>
>
> Thanks,
> Butao Zhang
>
>
>  Replied Message 
> From Stamatis Zampetakis 
> Date 4/2/2024 16:39
> To  
> Cc user@hive.apache.org 
> Subject Re: [ANNOUNCE] Apache Hive 4.0.0 Released
> The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
> Apache Hive 3.1.3) and it's probably the biggest release so far in the
> history of the project. The numbers clearly show that this is a
> collective effort that wouldn't be possible without a strong community
> and many volunteers along the years. Many thanks to everyone involved!
>
> A special mention to Denys who went above and beyond his role of
> release manager triaging release blockers, reviewing and fixing many
> of those tickets that were blocking us for the past few months.
>
> Best,
> Stamatis
>
> On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
>  wrote:
>
>
> Thank you for your hard work and dedication in releasing Apache Hive
> version 4.0.0.
>
> Congratulations to the entire team on this achievement. Keep up the great
> work!
>
> Does this consider as GA.?
>
> And Looks we need to update in the following location also.?
> https://hive.apache.org/general/downloads/
>
>
> From: Denys Kuzmenko 
> Date: Saturday, March 30, 2024 at 00:07
> To: user@hive.apache.org , d...@hive.apache.org <
> d...@hive.apache.org>
> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
> data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark
> frameworks. (MapReduce is deprecated, and Spark has been removed so the
> text needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team
>
>

-- 
--
Pau Tallada Crespí
Departament de Serveis
Port d'Informació Científica (PIC)
Tel: +34 93 170 2729
--


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Butao Zhang
I'm thrilled to see the official release of Apache Hive 4.0.0, marking another 
milestone in the development of the Hive community. I want to extend my 
gratitude to all the partners in the community for their hard work. 

Also special thanks to Denys for your diligent code reviews and efforts in 
completing the version release process, which I deeply admire.


Wishing the Apache Hive community continued growth and success. Keep up the 
great work!




Thanks,
Butao Zhang




 Replied Message 
| From | Stamatis Zampetakis |
| Date | 4/2/2024 16:39 |
| To |  |
| Cc | user@hive.apache.org |
| Subject | Re: [ANNOUNCE] Apache Hive 4.0.0 Released |
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
 wrote:

Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko 
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org , d...@hive.apache.org 

Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Stamatis Zampetakis
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
 wrote:
>
> Thank you for your hard work and dedication in releasing Apache Hive version 
> 4.0.0.
>
> Congratulations to the entire team on this achievement. Keep up the great 
> work!
>
> Does this consider as GA.?
>
> And Looks we need to update in the following location also.?
> https://hive.apache.org/general/downloads/
>
>
> From: Denys Kuzmenko 
> Date: Saturday, March 30, 2024 at 00:07
> To: user@hive.apache.org , d...@hive.apache.org 
> 
> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
> needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-03-31 Thread Battula, Brahma Reddy
Thank you for your hard work and dedication in releasing Apache Hive version 
4.0.0.

Congratulations to the entire team on this achievement. Keep up the great work!

Does this consider as GA.?

And Looks we need to update in the following location also.?
https://hive.apache.org/general/downloads/


From: Denys Kuzmenko 
Date: Saturday, March 30, 2024 at 00:07
To: user@hive.apache.org , d...@hive.apache.org 

Subject: [ANNOUNCE] Apache Hive 4.0.0 Released

The Apache Hive team is proud to announce the release of Apache Hive

version 4.0.0.



The Apache Hive (TM) data warehouse software facilitates querying and

managing large datasets residing in distributed storage. Built on top

of Apache Hadoop (TM), it provides, among others:



* Tools to enable easy data extract/transform/load (ETL)



* A mechanism to impose structure on a variety of data formats



* Access to files stored either directly in Apache HDFS (TM) or in other

  data storage systems such as Apache HBase (TM)



* Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
needs to be modified depending on the release version)



For Hive release details and downloads, please visit:

https://hive.apache.org/downloads.html



Hive 4.0.0 Release Notes are available here:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843



We would like to thank the many contributors who made this release

possible.



Regards,



The Apache Hive Team


Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-03-01 Thread Takanobu Asanuma
I should have mentioned earlier, but we encountered our problem with
queries between Trino and Hive3 MetaStore.
The tests I reported were also querying Hive1/3 MetaStore using Trino. The
problem might only exist between Trino and Hive3 MetaStore.

- Takanobu

2024年3月1日(金) 14:53 Takanobu Asanuma :

> Yes, for now, we believe that HIVE-14187 has caused performance
> degradation in Hive3 MetaStore.
>
> We also use HiveServer2, but our HiveServer2 directly accesses the backend
> DB without going through the Hive MetaStore, because it enhances
> performance to directly access the DB in a heavily loaded cluster.
> Therefore, we might not encounter the issue of HIVE-20600. We provide the
> Hive MetaStore only for Trino/SparkSQL/etc.
>
> - Takanobu
>
> 2024年3月1日(金) 11:32 Sungwoo Park :
>
>> Thank you for sharing the result. (Does your result imply that HIVE-14187
>> is introducing an intended bug?)
>>
>> Another issue that could be of your interest is the connection leak
>> problem reported in HIVE-20600. Do you see the connection leak problem, or
>> is it not relevant to your environment (e.g., because you don't use
>> HiveServer2)?
>>
>> --- Sungwoo
>>
>> On Fri, Mar 1, 2024 at 9:45 AM Takanobu Asanuma <
>> takanobu.asan...@gmail.com> wrote:
>>
>>> Hi Pau and Sungwoo,
>>>
>>> Thanks for sharing the information.
>>>
>>> We tested a set of simple queries which just referenced the Hive table
>>> and didn't execute any Hive jobs. The result is below.
>>>
>>> No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
>>> --
>>> 1   1.2.1   ObjectStoreNoneNot Applied 11:38
>>> 2   3.1.3   ObjectStoreNoneApplied 34:00
>>> 3   3.1.3   CachedStoreNoneApplied 25:00
>>> 4   3.1.3   ObjectStoreHikariCPApplied 21:10
>>> 5   3.1.3   CachedStoreHikariCPApplied 14:30
>>> 6   3.1.3   ObjectStoreNoneReverted13:00
>>> 7   3.1.3   ObjectStoreHikariCPReverted11:23
>>> --
>>>
>>> Initially, we encountered an issue of Hive MetaStore slowness when we
>>> upgraded from environment No.1 to No.2. As shown in the table, environment
>>> No.2 showed the worst test results.
>>>
>>> A unique aspect of our environment is that we don't use connection
>>> pooling. After some investigation, we thought that the combination of
>>> HIVE-14187 and connectionPoolingType=None was negatively impacting
>>> performance.
>>> The fastest case in our tests was when we reverted HIVE-14187 and set
>>> connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
>>> set to None, the environment where we reverted HIVE-14187 still performed
>>> reasonably well (see No.6).
>>>
>>> Please note our investigation is still ongoing and we haven't yet come
>>> to a conclusion.
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2024年2月29日(木) 12:18 Sungwoo Park :
>>>
 We didn't make any other attempt to fix the problem and just decided
 not to use CachedStore. However, I think our installation of Metastore
 based on Hive 3.1.3 is running without any serious problems.

 Could you share how long it takes to compile typical queries in your
 environment (with Hive 1 and with Hive 3)?

 FYI, in our environment, sometimes it takes about 10 seconds to compile
 a query on TPC-DS 10TB datasets. Specifically, the average compilation time
 of 103 queries is 1.7 seconds (as reported by Hive), and the longest
 compilation time is 9.6 seconds (query 49). The compilation time includes
 the time for accessing Metastore.

 Thanks,

 --- Sungwoo


 On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
 wrote:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long"
> in Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park :
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can
>> be different, so I don't know why Metastore responses are very slow. I 
>> can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a
>> result, some queries generate very inefficient plans because of
>> wrong/inaccurate stats.
>>
>> Perhaps this is because not all patches for CachedStore have been
>> merged to Hive 3.1.3. For example, these 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Takanobu Asanuma
Yes, for now, we believe that HIVE-14187 has caused performance degradation
in Hive3 MetaStore.

We also use HiveServer2, but our HiveServer2 directly accesses the backend
DB without going through the Hive MetaStore, because it enhances
performance to directly access the DB in a heavily loaded cluster.
Therefore, we might not encounter the issue of HIVE-20600. We provide the
Hive MetaStore only for Trino/SparkSQL/etc.

- Takanobu

2024年3月1日(金) 11:32 Sungwoo Park :

> Thank you for sharing the result. (Does your result imply that HIVE-14187
> is introducing an intended bug?)
>
> Another issue that could be of your interest is the connection leak
> problem reported in HIVE-20600. Do you see the connection leak problem, or
> is it not relevant to your environment (e.g., because you don't use
> HiveServer2)?
>
> --- Sungwoo
>
> On Fri, Mar 1, 2024 at 9:45 AM Takanobu Asanuma <
> takanobu.asan...@gmail.com> wrote:
>
>> Hi Pau and Sungwoo,
>>
>> Thanks for sharing the information.
>>
>> We tested a set of simple queries which just referenced the Hive table
>> and didn't execute any Hive jobs. The result is below.
>>
>> No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
>> --
>> 1   1.2.1   ObjectStoreNoneNot Applied 11:38
>> 2   3.1.3   ObjectStoreNoneApplied 34:00
>> 3   3.1.3   CachedStoreNoneApplied 25:00
>> 4   3.1.3   ObjectStoreHikariCPApplied 21:10
>> 5   3.1.3   CachedStoreHikariCPApplied 14:30
>> 6   3.1.3   ObjectStoreNoneReverted13:00
>> 7   3.1.3   ObjectStoreHikariCPReverted11:23
>> --
>>
>> Initially, we encountered an issue of Hive MetaStore slowness when we
>> upgraded from environment No.1 to No.2. As shown in the table, environment
>> No.2 showed the worst test results.
>>
>> A unique aspect of our environment is that we don't use connection
>> pooling. After some investigation, we thought that the combination of
>> HIVE-14187 and connectionPoolingType=None was negatively impacting
>> performance.
>> The fastest case in our tests was when we reverted HIVE-14187 and set
>> connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
>> set to None, the environment where we reverted HIVE-14187 still performed
>> reasonably well (see No.6).
>>
>> Please note our investigation is still ongoing and we haven't yet come to
>> a conclusion.
>>
>> Regards,
>> - Takanobu
>>
>> 2024年2月29日(木) 12:18 Sungwoo Park :
>>
>>> We didn't make any other attempt to fix the problem and just decided not
>>> to use CachedStore. However, I think our installation of Metastore based on
>>> Hive 3.1.3 is running without any serious problems.
>>>
>>> Could you share how long it takes to compile typical queries in your
>>> environment (with Hive 1 and with Hive 3)?
>>>
>>> FYI, in our environment, sometimes it takes about 10 seconds to compile
>>> a query on TPC-DS 10TB datasets. Specifically, the average compilation time
>>> of 103 queries is 1.7 seconds (as reported by Hive), and the longest
>>> compilation time is 9.6 seconds (query 49). The compilation time includes
>>> the time for accessing Metastore.
>>>
>>> Thanks,
>>>
>>> --- Sungwoo
>>>
>>>
>>> On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
>>> wrote:
>>>
 Thanks for your detailed answer!

 In the original email, you reported "the query compilation takes long"
 in Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
 Thank you for sharing the issue with CachedStore and the JIRA tickets.
 I will also try out metastore.stats.fetch.bitvector=true.

 Regards,
 - Takanobu

 2024年2月28日(水) 18:49 Sungwoo Park :

> Hello Takanobu,
>
> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
> different, so I don't know why Metastore responses are very slow. I can
> only share some results of testing CachedStore in Metastore. Please note
> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
> Hive 3.1.3 (which applies many additional patches).
>
> 1.
> When CachedStore is enabled, column stats are not computed. As a
> result, some queries generate very inefficient plans because of
> wrong/inaccurate stats.
>
> Perhaps this is because not all patches for CachedStore have been
> merged to Hive 3.1.3. For example, these patches are not merged. Or, there
> might be some way to properly configure CachedStore so that it correctly
> computes column stats.
>
> HIVE-20896: CachedStore fail to cache stats in multiple code paths
> HIVE-21063: Support statistics in cachedStore for transactional table
> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
> constraint
>
> So, we decided that CachedStore 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Sungwoo Park
Thank you for sharing the result. (Does your result imply that HIVE-14187
is introducing an intended bug?)

Another issue that could be of your interest is the connection leak problem
reported in HIVE-20600. Do you see the connection leak problem, or is it
not relevant to your environment (e.g., because you don't use HiveServer2)?

--- Sungwoo

On Fri, Mar 1, 2024 at 9:45 AM Takanobu Asanuma 
wrote:

> Hi Pau and Sungwoo,
>
> Thanks for sharing the information.
>
> We tested a set of simple queries which just referenced the Hive table and
> didn't execute any Hive jobs. The result is below.
>
> No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
> --
> 1   1.2.1   ObjectStoreNoneNot Applied 11:38
> 2   3.1.3   ObjectStoreNoneApplied 34:00
> 3   3.1.3   CachedStoreNoneApplied 25:00
> 4   3.1.3   ObjectStoreHikariCPApplied 21:10
> 5   3.1.3   CachedStoreHikariCPApplied 14:30
> 6   3.1.3   ObjectStoreNoneReverted13:00
> 7   3.1.3   ObjectStoreHikariCPReverted11:23
> --
>
> Initially, we encountered an issue of Hive MetaStore slowness when we
> upgraded from environment No.1 to No.2. As shown in the table, environment
> No.2 showed the worst test results.
>
> A unique aspect of our environment is that we don't use connection
> pooling. After some investigation, we thought that the combination of
> HIVE-14187 and connectionPoolingType=None was negatively impacting
> performance.
> The fastest case in our tests was when we reverted HIVE-14187 and set
> connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
> set to None, the environment where we reverted HIVE-14187 still performed
> reasonably well (see No.6).
>
> Please note our investigation is still ongoing and we haven't yet come to
> a conclusion.
>
> Regards,
> - Takanobu
>
> 2024年2月29日(木) 12:18 Sungwoo Park :
>
>> We didn't make any other attempt to fix the problem and just decided not
>> to use CachedStore. However, I think our installation of Metastore based on
>> Hive 3.1.3 is running without any serious problems.
>>
>> Could you share how long it takes to compile typical queries in your
>> environment (with Hive 1 and with Hive 3)?
>>
>> FYI, in our environment, sometimes it takes about 10 seconds to compile a
>> query on TPC-DS 10TB datasets. Specifically, the average compilation time
>> of 103 queries is 1.7 seconds (as reported by Hive), and the longest
>> compilation time is 9.6 seconds (query 49). The compilation time includes
>> the time for accessing Metastore.
>>
>> Thanks,
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
>> wrote:
>>
>>> Thanks for your detailed answer!
>>>
>>> In the original email, you reported "the query compilation takes long"
>>> in Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
>>> Thank you for sharing the issue with CachedStore and the JIRA tickets.
>>> I will also try out metastore.stats.fetch.bitvector=true.
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2024年2月28日(水) 18:49 Sungwoo Park :
>>>
 Hello Takanobu,

 We did not test with vanilla Hive 3.1.3 and Metastore databases can be
 different, so I don't know why Metastore responses are very slow. I can
 only share some results of testing CachedStore in Metastore. Please note
 that we did not use vanilla Hive 3.1.3 and instead used our own fork of
 Hive 3.1.3 (which applies many additional patches).

 1.
 When CachedStore is enabled, column stats are not computed. As a
 result, some queries generate very inefficient plans because of
 wrong/inaccurate stats.

 Perhaps this is because not all patches for CachedStore have been
 merged to Hive 3.1.3. For example, these patches are not merged. Or, there
 might be some way to properly configure CachedStore so that it correctly
 computes column stats.

 HIVE-20896: CachedStore fail to cache stats in multiple code paths
 HIVE-21063: Support statistics in cachedStore for transactional table
 HIVE-24258: Data mismatch between CachedStore and ObjectStore for
 constraint

 So, we decided that CachedStore should not be enabled in Hive 3.1.3.

 (If anyone is running Hive Metastore 3.1.3 in production with
 CachedStore enabled, please let us know how you configure it.)

 2.
 Setting metastore.stats.fetch.bitvector=true can also help generate
 more efficient query plans.

 --- Sungwoo


 On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
 wrote:

> Hi Sungwoo Park,
>
> I'm sorry for the late reply to this old email.
> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
> noticed that the response of the Hive3 MetaStore is 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Takanobu Asanuma
Hi Pau and Sungwoo,

Thanks for sharing the information.

We tested a set of simple queries which just referenced the Hive table and
didn't execute any Hive jobs. The result is below.

No. Version rawstore.impl connectionPoolingType HIVE-14187 QueryTime
--
1   1.2.1   ObjectStoreNoneNot Applied 11:38
2   3.1.3   ObjectStoreNoneApplied 34:00
3   3.1.3   CachedStoreNoneApplied 25:00
4   3.1.3   ObjectStoreHikariCPApplied 21:10
5   3.1.3   CachedStoreHikariCPApplied 14:30
6   3.1.3   ObjectStoreNoneReverted13:00
7   3.1.3   ObjectStoreHikariCPReverted11:23
--

Initially, we encountered an issue of Hive MetaStore slowness when we
upgraded from environment No.1 to No.2. As shown in the table, environment
No.2 showed the worst test results.

A unique aspect of our environment is that we don't use connection pooling.
After some investigation, we thought that the combination of HIVE-14187 and
connectionPoolingType=None was negatively impacting performance.
The fastest case in our tests was when we reverted HIVE-14187 and set
connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType
set to None, the environment where we reverted HIVE-14187 still performed
reasonably well (see No.6).

Please note our investigation is still ongoing and we haven't yet come to a
conclusion.

Regards,
- Takanobu

2024年2月29日(木) 12:18 Sungwoo Park :

> We didn't make any other attempt to fix the problem and just decided not
> to use CachedStore. However, I think our installation of Metastore based on
> Hive 3.1.3 is running without any serious problems.
>
> Could you share how long it takes to compile typical queries in your
> environment (with Hive 1 and with Hive 3)?
>
> FYI, in our environment, sometimes it takes about 10 seconds to compile a
> query on TPC-DS 10TB datasets. Specifically, the average compilation time
> of 103 queries is 1.7 seconds (as reported by Hive), and the longest
> compilation time is 9.6 seconds (query 49). The compilation time includes
> the time for accessing Metastore.
>
> Thanks,
>
> --- Sungwoo
>
>
> On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
> wrote:
>
>> Thanks for your detailed answer!
>>
>> In the original email, you reported "the query compilation takes long" in
>> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
>> Thank you for sharing the issue with CachedStore and the JIRA tickets.
>> I will also try out metastore.stats.fetch.bitvector=true.
>>
>> Regards,
>> - Takanobu
>>
>> 2024年2月28日(水) 18:49 Sungwoo Park :
>>
>>> Hello Takanobu,
>>>
>>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>>> different, so I don't know why Metastore responses are very slow. I can
>>> only share some results of testing CachedStore in Metastore. Please note
>>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>>> Hive 3.1.3 (which applies many additional patches).
>>>
>>> 1.
>>> When CachedStore is enabled, column stats are not computed. As a result,
>>> some queries generate very inefficient plans because of wrong/inaccurate
>>> stats.
>>>
>>> Perhaps this is because not all patches for CachedStore have been merged
>>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>>> be some way to properly configure CachedStore so that it correctly computes
>>> column stats.
>>>
>>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>>> HIVE-21063: Support statistics in cachedStore for transactional table
>>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>>> constraint
>>>
>>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>>
>>> (If anyone is running Hive Metastore 3.1.3 in production with
>>> CachedStore enabled, please let us know how you configure it.)
>>>
>>> 2.
>>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>>> efficient query plans.
>>>
>>> --- Sungwoo
>>>
>>>
>>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
>>> wrote:
>>>
 Hi Sungwoo Park,

 I'm sorry for the late reply to this old email.
 We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
 noticed that the response of the Hive3 MetaStore is very slow.
 We suspect that HIVE-14187 might be causing this slowness.
 Could you tell me if you have resolved this problem? Are there still
 any problems when you enable CachedStore?

 Regards,
 - Takanobu

 2018年6月13日(水) 0:37 Sungwoo Park :

> Hello Hive users,
>
> I am experience a problem with MetaStore in Hive 3.0.
>
> 1. Start MetaStore
> with 
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>
> 2. Generate TPC-DS data.
>
> 3. 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Sungwoo Park
We didn't make any other attempt to fix the problem and just decided not to
use CachedStore. However, I think our installation of Metastore based on
Hive 3.1.3 is running without any serious problems.

Could you share how long it takes to compile typical queries in your
environment (with Hive 1 and with Hive 3)?

FYI, in our environment, sometimes it takes about 10 seconds to compile a
query on TPC-DS 10TB datasets. Specifically, the average compilation time
of 103 queries is 1.7 seconds (as reported by Hive), and the longest
compilation time is 9.6 seconds (query 49). The compilation time includes
the time for accessing Metastore.

Thanks,

--- Sungwoo


On Wed, Feb 28, 2024 at 9:59 PM Takanobu Asanuma 
wrote:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long" in
> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park :
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>> different, so I don't know why Metastore responses are very slow. I can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a result,
>> some queries generate very inefficient plans because of wrong/inaccurate
>> stats.
>>
>> Perhaps this is because not all patches for CachedStore have been merged
>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>> be some way to properly configure CachedStore so that it correctly computes
>> column stats.
>>
>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>> HIVE-21063: Support statistics in cachedStore for transactional table
>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>> constraint
>>
>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>
>> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
>> enabled, please let us know how you configure it.)
>>
>> 2.
>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>> efficient query plans.
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
>> wrote:
>>
>>> Hi Sungwoo Park,
>>>
>>> I'm sorry for the late reply to this old email.
>>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>>> noticed that the response of the Hive3 MetaStore is very slow.
>>> We suspect that HIVE-14187 might be causing this slowness.
>>> Could you tell me if you have resolved this problem? Are there still any
>>> problems when you enable CachedStore?
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2018年6月13日(水) 0:37 Sungwoo Park :
>>>
 Hello Hive users,

 I am experience a problem with MetaStore in Hive 3.0.

 1. Start MetaStore
 with 
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.

 2. Generate TPC-DS data.

 3. TPC-DS queries run okay and produce correct results. E.g., from
 query 1:
 +---+
 |   c_customer_id   |
 +---+
 | CHAA  |
 | DCAA  |
 | DDAA  |
 ...
 | AAAILIAA  |
 +---+
 100 rows selected (69.901 seconds)

 However, the query compilation takes long (
 https://issues.apache.org/jira/browse/HIVE-16520).

 4. Now, restart MetaStore with
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.

 5. TPC-DS queries run okay, but produce wrong results. E.g, from query
 1:
 ++
 | c_customer_id  |
 ++
 ++
 No rows selected (37.448 seconds)

 What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
 HiveServer2 produces such log messages:

 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Pau Tallada
Hi,

We also had to disable CachedStore as it was producing wrong results in our
queries.
I'm sorry I cannot provide more detailed info.

Cheers,

Pau.

Missatge de Takanobu Asanuma  del dia dc., 28 de febr.
2024 a les 13:59:

> Thanks for your detailed answer!
>
> In the original email, you reported "the query compilation takes long" in
> Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
> Thank you for sharing the issue with CachedStore and the JIRA tickets.
> I will also try out metastore.stats.fetch.bitvector=true.
>
> Regards,
> - Takanobu
>
> 2024年2月28日(水) 18:49 Sungwoo Park :
>
>> Hello Takanobu,
>>
>> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
>> different, so I don't know why Metastore responses are very slow. I can
>> only share some results of testing CachedStore in Metastore. Please note
>> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
>> Hive 3.1.3 (which applies many additional patches).
>>
>> 1.
>> When CachedStore is enabled, column stats are not computed. As a result,
>> some queries generate very inefficient plans because of wrong/inaccurate
>> stats.
>>
>> Perhaps this is because not all patches for CachedStore have been merged
>> to Hive 3.1.3. For example, these patches are not merged. Or, there might
>> be some way to properly configure CachedStore so that it correctly computes
>> column stats.
>>
>> HIVE-20896: CachedStore fail to cache stats in multiple code paths
>> HIVE-21063: Support statistics in cachedStore for transactional table
>> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
>> constraint
>>
>> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>>
>> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
>> enabled, please let us know how you configure it.)
>>
>> 2.
>> Setting metastore.stats.fetch.bitvector=true can also help generate more
>> efficient query plans.
>>
>> --- Sungwoo
>>
>>
>> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
>> wrote:
>>
>>> Hi Sungwoo Park,
>>>
>>> I'm sorry for the late reply to this old email.
>>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>>> noticed that the response of the Hive3 MetaStore is very slow.
>>> We suspect that HIVE-14187 might be causing this slowness.
>>> Could you tell me if you have resolved this problem? Are there still any
>>> problems when you enable CachedStore?
>>>
>>> Regards,
>>> - Takanobu
>>>
>>> 2018年6月13日(水) 0:37 Sungwoo Park :
>>>
 Hello Hive users,

 I am experience a problem with MetaStore in Hive 3.0.

 1. Start MetaStore
 with 
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.

 2. Generate TPC-DS data.

 3. TPC-DS queries run okay and produce correct results. E.g., from
 query 1:
 +---+
 |   c_customer_id   |
 +---+
 | CHAA  |
 | DCAA  |
 | DDAA  |
 ...
 | AAAILIAA  |
 +---+
 100 rows selected (69.901 seconds)

 However, the query compilation takes long (
 https://issues.apache.org/jira/browse/HIVE-16520).

 4. Now, restart MetaStore with
 hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.

 5. TPC-DS queries run okay, but produce wrong results. E.g, from query
 1:
 ++
 | c_customer_id  |
 ++
 ++
 No rows selected (37.448 seconds)

 What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
 HiveServer2 produces such log messages:

 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
 tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
 c_customer_id
 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
 HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Takanobu Asanuma
Thanks for your detailed answer!

In the original email, you reported "the query compilation takes long" in
Hive 3.0, but has this issue been resolved in your fork of Hive 3.1.3?
Thank you for sharing the issue with CachedStore and the JIRA tickets.
I will also try out metastore.stats.fetch.bitvector=true.

Regards,
- Takanobu

2024年2月28日(水) 18:49 Sungwoo Park :

> Hello Takanobu,
>
> We did not test with vanilla Hive 3.1.3 and Metastore databases can be
> different, so I don't know why Metastore responses are very slow. I can
> only share some results of testing CachedStore in Metastore. Please note
> that we did not use vanilla Hive 3.1.3 and instead used our own fork of
> Hive 3.1.3 (which applies many additional patches).
>
> 1.
> When CachedStore is enabled, column stats are not computed. As a result,
> some queries generate very inefficient plans because of wrong/inaccurate
> stats.
>
> Perhaps this is because not all patches for CachedStore have been merged
> to Hive 3.1.3. For example, these patches are not merged. Or, there might
> be some way to properly configure CachedStore so that it correctly computes
> column stats.
>
> HIVE-20896: CachedStore fail to cache stats in multiple code paths
> HIVE-21063: Support statistics in cachedStore for transactional table
> HIVE-24258: Data mismatch between CachedStore and ObjectStore for
> constraint
>
> So, we decided that CachedStore should not be enabled in Hive 3.1.3.
>
> (If anyone is running Hive Metastore 3.1.3 in production with CachedStore
> enabled, please let us know how you configure it.)
>
> 2.
> Setting metastore.stats.fetch.bitvector=true can also help generate more
> efficient query plans.
>
> --- Sungwoo
>
>
> On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
> wrote:
>
>> Hi Sungwoo Park,
>>
>> I'm sorry for the late reply to this old email.
>> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
>> noticed that the response of the Hive3 MetaStore is very slow.
>> We suspect that HIVE-14187 might be causing this slowness.
>> Could you tell me if you have resolved this problem? Are there still any
>> problems when you enable CachedStore?
>>
>> Regards,
>> - Takanobu
>>
>> 2018年6月13日(水) 0:37 Sungwoo Park :
>>
>>> Hello Hive users,
>>>
>>> I am experience a problem with MetaStore in Hive 3.0.
>>>
>>> 1. Start MetaStore
>>> with 
>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>>>
>>> 2. Generate TPC-DS data.
>>>
>>> 3. TPC-DS queries run okay and produce correct results. E.g., from query
>>> 1:
>>> +---+
>>> |   c_customer_id   |
>>> +---+
>>> | CHAA  |
>>> | DCAA  |
>>> | DDAA  |
>>> ...
>>> | AAAILIAA  |
>>> +---+
>>> 100 rows selected (69.901 seconds)
>>>
>>> However, the query compilation takes long (
>>> https://issues.apache.org/jira/browse/HIVE-16520).
>>>
>>> 4. Now, restart MetaStore with
>>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>>>
>>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
>>> ++
>>> | c_customer_id  |
>>> ++
>>> ++
>>> No rows selected (37.448 seconds)
>>>
>>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
>>> HiveServer2 produces such log messages:
>>>
>>> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>>> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>>> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>>> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>>> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>>> c_customer_id
>>> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>>> c_customer_id
>>>
>>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>>> Invalid column stats: No of nulls > cardinality
>>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>>> 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Sungwoo Park
Hello Takanobu,

We did not test with vanilla Hive 3.1.3 and Metastore databases can be
different, so I don't know why Metastore responses are very slow. I can
only share some results of testing CachedStore in Metastore. Please note
that we did not use vanilla Hive 3.1.3 and instead used our own fork of
Hive 3.1.3 (which applies many additional patches).

1.
When CachedStore is enabled, column stats are not computed. As a result,
some queries generate very inefficient plans because of wrong/inaccurate
stats.

Perhaps this is because not all patches for CachedStore have been merged to
Hive 3.1.3. For example, these patches are not merged. Or, there might be
some way to properly configure CachedStore so that it correctly computes
column stats.

HIVE-20896: CachedStore fail to cache stats in multiple code paths
HIVE-21063: Support statistics in cachedStore for transactional table
HIVE-24258: Data mismatch between CachedStore and ObjectStore for constraint

So, we decided that CachedStore should not be enabled in Hive 3.1.3.

(If anyone is running Hive Metastore 3.1.3 in production with CachedStore
enabled, please let us know how you configure it.)

2.
Setting metastore.stats.fetch.bitvector=true can also help generate more
efficient query plans.

--- Sungwoo


On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma 
wrote:

> Hi Sungwoo Park,
>
> I'm sorry for the late reply to this old email.
> We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
> noticed that the response of the Hive3 MetaStore is very slow.
> We suspect that HIVE-14187 might be causing this slowness.
> Could you tell me if you have resolved this problem? Are there still any
> problems when you enable CachedStore?
>
> Regards,
> - Takanobu
>
> 2018年6月13日(水) 0:37 Sungwoo Park :
>
>> Hello Hive users,
>>
>> I am experience a problem with MetaStore in Hive 3.0.
>>
>> 1. Start MetaStore
>> with 
>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>>
>> 2. Generate TPC-DS data.
>>
>> 3. TPC-DS queries run okay and produce correct results. E.g., from query
>> 1:
>> +---+
>> |   c_customer_id   |
>> +---+
>> | CHAA  |
>> | DCAA  |
>> | DDAA  |
>> ...
>> | AAAILIAA  |
>> +---+
>> 100 rows selected (69.901 seconds)
>>
>> However, the query compilation takes long (
>> https://issues.apache.org/jira/browse/HIVE-16520).
>>
>> 4. Now, restart MetaStore with
>> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>>
>> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
>> ++
>> | c_customer_id  |
>> ++
>> ++
>> No rows selected (37.448 seconds)
>>
>> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
>> HiveServer2 produces such log messages:
>>
>> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
>> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
>> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>> c_customer_id
>> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
>> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
>> c_customer_id
>>
>> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>> Invalid column stats: No of nulls > cardinality
>> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>> Invalid column stats: No of nulls > cardinality
>> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
>> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
>> Invalid column stats: No of nulls > cardinality
>>
>> However, even after computing column stats, queries still return wrong
>> results, despite the fact that the above log messages disappear.
>>
>> I guess I am missing some configuration parameters 

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-27 Thread Takanobu Asanuma
Hi Sungwoo Park,

I'm sorry for the late reply to this old email.
We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, and
noticed that the response of the Hive3 MetaStore is very slow.
We suspect that HIVE-14187 might be causing this slowness.
Could you tell me if you have resolved this problem? Are there still any
problems when you enable CachedStore?

Regards,
- Takanobu

2018年6月13日(水) 0:37 Sungwoo Park :

> Hello Hive users,
>
> I am experience a problem with MetaStore in Hive 3.0.
>
> 1. Start MetaStore
> with 
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore.
>
> 2. Generate TPC-DS data.
>
> 3. TPC-DS queries run okay and produce correct results. E.g., from query 1:
> +---+
> |   c_customer_id   |
> +---+
> | CHAA  |
> | DCAA  |
> | DDAA  |
> ...
> | AAAILIAA  |
> +---+
> 100 rows selected (69.901 seconds)
>
> However, the query compilation takes long (
> https://issues.apache.org/jira/browse/HIVE-16520).
>
> 4. Now, restart MetaStore with
> hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.cache.CachedStore.
>
> 5. TPC-DS queries run okay, but produce wrong results. E.g, from query 1:
> ++
> | c_customer_id  |
> ++
> ++
> No rows selected (37.448 seconds)
>
> What I noticed is that with hive.metastore.rawstore.impl=CachedStore,
> HiveServer2 produces such log messages:
>
> 2018-06-12T23:50:04,223  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
> 2018-06-12T23:50:04,223  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@date_dim, Columns: d_date_sk, d_year
> 2018-06-12T23:50:04,225  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
> 2018-06-12T23:50:04,225  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@store, Columns: s_state, s_store_sk
> 2018-06-12T23:50:04,226  WARN [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] calcite.RelOptHiveTable: No Stats for
> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
> c_customer_id
> 2018-06-12T23:50:04,226  INFO [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] SessionState: No Stats for
> tpcds_bin_partitioned_orc_1000@customer, Columns: c_customer_sk,
> c_customer_id
>
> 2018-06-12T23:50:05,158 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
> 2018-06-12T23:50:05,159 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
> 2018-06-12T23:50:05,160 ERROR [b3041385-0290-492f-aef8-c0249de328ad
> HiveServer2-Handler-Pool: Thread-59] annotation.StatsRulesProcFactory:
> Invalid column stats: No of nulls > cardinality
>
> However, even after computing column stats, queries still return wrong
> results, despite the fact that the above log messages disappear.
>
> I guess I am missing some configuration parameters (because I imported
> hive-site.xml from Hive 2). Any suggestion would be appreciated.
>
> Thanks a lot,
>
> --- Sungwoo Park
>
>


Re: [EXTERNAL] Backport HIVE-21075 into 3.2.0 release

2024-02-07 Thread Aman Raj via user
Hi ,

No HIVE-21075 was not planned as of now in the 3.2.0 release. But will check 
the feasibility and implications of the same and add it to the parent JIRA.

Its difficult to quote an exact date for the release since we are working on 
backporting some important tickets (linked to parent JIRA) but hopefully we 
will come up with a release date soon.

Thanks,
Aman.

From: Daniel Cristian 
Sent: Wednesday, February 7, 2024 8:20 PM
To: user@hive.apache.org 
Subject: [EXTERNAL] Backport HIVE-21075 into 3.2.0 release

You don't often get email from danielcri...@gmail.com. Learn why this is 
important
Hi,

I saw that HIVE-21075 has a 
bug fix on a performance problem for drop partitions with PostgreSQL or MySQL, 
and the fix was merged only into the 4.0.0-alph-1 version.

I also saw that you plan to release a 3.2.0 version with 
HIVE-26751. Do you plan to 
backport HIVE-21075 into 
HIVE-26751 and have this bug 
fixed on the next release?

Would you also let me know if you have an expected date for the new release?

I'm using an AWS RDS PostgreSQL DB for my Hive Metastore and suffer from this 
performance problem where my RDS keeps 100% CPU usage during the unregistering 
process.

Best Regards,
Daniel Cristian




Re: [Hive Support] Query about StandardStructObjectInspector converting field names to lowercase

2024-02-01 Thread Stamatis Zampetakis
Hi Chang,

The hive-hcatalog-core-1.1.0-cdh5.13.1.jar jar file is not something
maintained by Apache. For vendor specific problems you should reach
out to the respective support team from where you obtained the
product.

Apart from that the version that you are using (5.13.1) is quite old.
Please re-try your use-case with the latest Apache Hive 4.0.0-beta-1
release [1] and report back if you still observe unexpected behavior.

Best,
Stamatis

[1] https://hive.apache.org/general/downloads/

On Mon, Jan 29, 2024 at 5:52 AM chang.wd  wrote:
>
> Dear Hive Support Team,
>
> I hope you are doing well. I am writing to inquire about a specific behavior 
> I encountered in Hive, related to the 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector 
> class.
>
> Sql to reply this behavior:
> ```
> -- add JsonSerDe jar
> ADD JAR hive-hcatalog-core-1.1.0-cdh5.13.1.jar;
> -- create json table, the `struct` will become to lower case: 
> `struct`.
> CREATE TABLE `test.hive_json_struct_schema`(
>   `cond_keys` struct
> )
> ROW FORMAT SERDE
>   'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> ```
>
> When using the StandardStructObjectInspector class, it appears that field 
> names are being automatically converted to lowercase in the following code 
> snippet:
>
> ```
> this.fieldName = fieldName.toLowerCase();
> ```
>
> This behavior subsequently causes issues when querying JSON formatted tables, 
> particularly when nested Struct field names within the JSON data contain a 
> mix of uppercase and lowercase characters. Since field names are being 
> changed to lowercase by the StandardStructObjectInspector class, the actual 
> field names no longer match the expected field names, which leads to errors 
> when reading the data.(Not with SQL)
>
> I would appreciate if you could kindly provide an explanation for this design 
> choice and whether there are any available workarounds or alternative 
> solutions for this scenario. I understand that the class may have been 
> implemented to avoid case sensitivity issues, but in cases like mine where 
> field name case matters, it would be helpful to have a better understanding 
> of how to handle this situation.
>
> Thank you in advance for your assistance and guidance. I look forward to 
> hearing from you.
>
> Best regards,
>
> Chang


Re: Hive on Docker

2024-01-18 Thread Sanjay Gupta
nevermind, I have to set these properties on hive metastore to resolve
issue.I was setting these in HIveServer2

On Wed, Jan 17, 2024 at 11:38 PM Sanjay Gupta  wrote:
>
> Hi,
> I get exactly same issue as described here in Docker container which
> is running Hive Metastore and HS2.
>
> Using Hive version 3.1.3 ( inside docker container)
>
> https://issues.apache.org/jira/browse/HIVE-19740
> It looks like close to above issue
>
> I have also set following in hive-site.xml as per suggestion but still
> I get same issue in log file
>
> 
> hive.metastore.event.db.notification.api.auth
> false
> 
> 
> hadoop.proxyuser.hive.hosts
> *
> 
>
> 
> hadoop.proxyuser.hive.groups
> *
> 
>
>
> 2024-01-18T07:36:07,203  INFO [main] metastore.HiveMetaStoreClient:
> Connected to metastore.
> 2024-01-18T07:36:07,204  INFO [main] server.HiveServer2: Shutting down
> HiveServer2
> 2024-01-18T07:36:07,205  INFO [main] server.HiveServer2:
> Stopping/Disconnecting tez sessions.
> 2024-01-18T07:36:07,205  INFO [main] metastore.HiveMetaStoreClient:
> Closed a connection to metastore, current connections: 2
> 2024-01-18T07:36:07,205  WARN [main] server.HiveServer2: Error
> starting HiveServer2 on attempt 3, will retry in 6ms
> java.lang.RuntimeException: Error initializing notification event poll
> at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:274)
> ~[hive-service-3.1.3.jar:3.1.3]
> at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1038)
> [hive-service-3.1.3.jar:3.1.3]
> at 
> org.apache.hive.service.server.HiveServer2.access$1600(HiveServer2.java:139)
> [hive-service-3.1.3.jar:3.1.3]
> at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:1307)
> [hive-service-3.1.3.jar:3.1.3]
> at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:1151)
> [hive-service-3.1.3.jar:3.1.3]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_342]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_342]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> [hadoop-common-3.1.0.jar:?]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> [hadoop-common-3.1.0.jar:?]
> Caused by: java.io.IOException:
> org.apache.thrift.TApplicationException: Internal error processing
> get_current_notificationEventId
> at 
> org.apache.hadoop.hive.metastore.messaging.EventUtils$MSClientNotificationFetcher.getCurrentNotificationEventId(EventUtils.java:75)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.(NotificationEventPoll.java:103)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.ql.metadata.events.NotificationEventPoll.initialize(NotificationEventPoll.java:59)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:272)
> ~[hive-service-3.1.3.jar:3.1.3]
> ... 10 more
> Caused by: org.apache.thrift.TApplicationException: Internal error
> processing get_current_notificationEventId
> at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_current_notificationEventId(ThriftHiveMetastore.java:5575)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_current_notificationEventId(ThriftHiveMetastore.java:5563)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getCurrentNotificationEventId(HiveMetaStoreClient.java:2723)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_342]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_342]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:212)
> ~[hive-exec-3.1.3.jar:3.1.3]
> at com.sun.proxy.$Proxy25.getCurrentNotificationEventId(Unknown Source) ~[?:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_342]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_342]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_342]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_342]
> at 
> 

Re: Contributing doc

2024-01-14 Thread Stamatis Zampetakis
Hi Henri,

I gave you the necessary permissions to the wiki. Please check and if
you encounter any issues let us know.

Best,
Stamatus

On Fri, Jan 12, 2024 at 10:56 AM Henri Biestro  wrote:
>
>
> My Apache Id is hen...@apache.org.
> Cheers
>
> On 2024/01/12 09:52:20 Henri Biestro wrote:
> > Hello;
> > I'd like to contribute some documentation on Hive 4 - (
> > https://issues.apache.org/jira/browse/HIVE-27186  for instance)
> > May I get write access to the Wiki ( ie
> > https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0 ) ?
> > Thanks
> > Henri
> >


Re: Contributing doc

2024-01-12 Thread Henri Biestro


My Apache Id is hen...@apache.org.
Cheers

On 2024/01/12 09:52:20 Henri Biestro wrote:
> Hello;
> I'd like to contribute some documentation on Hive 4 - (
> https://issues.apache.org/jira/browse/HIVE-27186  for instance)
> May I get write access to the Wiki ( ie
> https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0 ) ?
> Thanks
> Henri
> 


Re: "org.apache.thrift.transport.TTransportException: Invalid status -128" errors when SASL is enabled

2024-01-11 Thread Austin Hackett
For the benefit of anyone who comes across this error in future, it was solved 
by adding hive.metastore.sasl.enabled and hive.metastore.kerberos.principal to 
hive-site.xml on the client side, e.g. $SPARK_HOME/conf


> On 8 Jan 2024, at 16:18, Austin Hackett  wrote:
> 
> Hi List
>  
> I'm having an issue where Hive Metastore operations (e.g. show databases) are 
> failing with "org.apache.thrift.transport.TTransportException: Invalid status 
> -128" errors when I enable SASL.
>  
> I am a bit stuck on how to go about troubleshooting this further, and any 
> pointers would be greatly apprecicated...
>  
> Full details as follows:
>  
> - Ubuntu 22.04 & OpenJDK 8u342
> - Unpacked Hive 3.1.3 binary release 
> (https://dlcdn.apache.org/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz) to 
> /opt/hive
> - Unpacked Hadoop 3.1.0 binary release 
> (https://archive.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz)
>  to /opt/hadoop
> - Created /opt/hive/conf/metastore-site.xml (see below for contents) and 
> copied hdfs-site.xml and core-site.xml from the target HDFS cluster to 
> /opt/hive/conf
> - export HADOOP_HOME=/opt/hadoop
> - export HIVE_HOME=/opt/hive
> - Successfully started the metastore, i.e. hive --service metastore
> - Use a Hive Metastore client to "show databases" and get an error (see below 
> for the associated errors in the HMS log). I get the same error with 
> spark-shell running in local mode and the Python hive-metastore-client 
> (https://pypi.org/project/hive-metastore-client/)
>  
>  
> metastore-site.xml
> ==
> 
>   
> metastore.warehouse.dir
> /user/hive/warehouse
>   
>   
> javax.jdo.option.ConnectionDriverName
> org.postgresql.Driver
>   
>   
> javax.jdo.option.ConnectionURL
> jdbc:postgresql://postgres.example.net:5432/metastore_db 
> 
>   
>   
> javax.jdo.option.ConnectionUserName
> hive
>   
>   
> javax.jdo.option.ConnectionPassword
> password
>   
>   
> metastore.kerberos.principal
> hive/_h...@example.net >
>   
>   
> metastore.kerberos.keytab.file
> /etc/security/keytabs/hive.keytab
>   
>   
> hive.metastore.sasl.enabled
> true
>   
> 
> ==
>  
> HMS log shows that it is able to authenticate using the specified keytab and 
> principle (and I have also checked this manually via kinit command):
>  
> 
> 2024-01-08T13:12:33,463  WARN [main] security.HadoopThriftAuthBridge: 
> Client-facing principal not set. Using server-side setting: 
> hive/_h...@example.net 
> 2024-01-08T13:12:33,464  INFO [main] security.HadoopThriftAuthBridge: Logging 
> in via CLIENT based principal
> 2024-01-08T13:12:33,471 DEBUG [main] security.UserGroupInformation: Hadoop 
> login
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: hadoop 
> login commit
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: Using 
> kerberos user: hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: Using 
> user: "hive/metstore.example@example.net 
> " with name: 
> hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,472 DEBUG [main] security.UserGroupInformation: User 
> entry: "hive/metstore.example@example.net 
> "
> 2024-01-08T13:12:33,472  INFO [main] security.UserGroupInformation: Login 
> successful for user hive/metstore.example@example.net 
>  using keytab file hive.keytab. 
> Keytab auto renewal enabled : false
> 2024-01-08T13:12:33,472  INFO [main] security.HadoopThriftAuthBridge: Logging 
> in via SERVER based principal
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Hadoop 
> login
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: hadoop 
> login commit
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Using 
> kerberos user: hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: Using 
> user: "hive/metstore.example@example.net 
> " with name: 
> hive/metstore.example@example.net 
> 
> 2024-01-08T13:12:33,480 DEBUG [main] security.UserGroupInformation: User 
> entry: "hive/metstore.example@example.net 
> "
> 2024-01-08T13:12:33,480  INFO [main] security.UserGroupInformation: Login 
> successful for user hive/metstore.example@example.net 
>  using keytab file hive.keytab. 
> Keytab auto renewal enabled : false
> 
>  
> 

Re: Docker Hive using tez without hdfs

2024-01-11 Thread Sanjay Gupta
Thanks Attila & Ayush,
I don't have permission to open Jira ticket yet but I have initiated process.
I have tried with Tez 9.1 and also version 10.2 and same issue.
I have noticed that when I change default hive.execution.engine=mr in
hive-site.xml ( restart hive service ) and after that start  hive cli
and then do set hive.execution.engine=tez on command line and run
query, it doesn't give error.
However when default engine is set to tez in hive-site.xml, hive cli
exits out with error
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> >> org/apache/tez/dag/api/TezConfiguration

Thanks

On Wed, Jan 10, 2024 at 5:40 AM Attila Turoczy  wrote:
>
> Agree with Ayush.
>
> Back to the original issue, is it not related to the latest Tez fix? As I 
> remember there was an incompatibility issue, which the next tez release will 
> fix. Maybe this is related to that. Sanjay could you please create a JIRA 
> around it for the tracking, and the community or someone from the community 
> will check. (I know most of you don't like the jira but this could help the 
> tracking then a mail thread)
>
> -Attila
>
> On Wed, Jan 10, 2024 at 8:45 AM Ayush Saxena  wrote:
>>
>> Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS 
>> thing, so, it is a vendor product just being tried to advertised in the 
>> ‘Apache’ Hive space
>>
>> So, it can be all mess, filled with security issues or bugs & we Apache Hive 
>> for the record aren’t responsible for that neither do we endorse usage of 
>> that or anything outside the scope of Apache
>>
>> -Ayush
>>
>> On 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:
>>
>> 
>> As far as I know, Hive-Tez supports local mode, but does not standalone mode 
>> (like Spark). Hive-MR3 supports standalone mode, so you can run it in any 
>> type of cluster.
>>
>> --- Sungwoo
>>
>> On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:
>>>
>>> I can run hive with mr engine in local mode. Does Hive + Tez also
>>> works in standalone mode ?
>>>
>>> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>>> >
>>> > Hello,
>>> >
>>> > I don't have an answer to your problem, but if your goal is to quickly 
>>> > test Hive 3 using Docker, there is an alternative way which uses Hive on 
>>> > MR3.
>>> >
>>> > https://mr3docs.datamonad.com/docs/quick/docker/
>>> >
>>> > You can also run Hive on MR3 on Kubernetes.
>>> >
>>> > Thanks,
>>> >
>>> > --- Sungwoo
>>> >
>>> >
>>> >
>>> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
>>> >>
>>> >> Hi,
>>> >> Using following docker container to run meta , hiveserver2
>>> >>
>>> >> https://hub.docker.com/r/apache/hive
>>> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
>>> >>
>>> >> I have configured hive-site.xml to se S3
>>> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
>>> >> running fine and I can perform queries but setting to tez fails with
>>> >> error.
>>> >> There is no hdfs but it is running in local mode.
>>> >>
>>> >> 
>>> >> hive.execution.engine
>>> >> tez
>>> >> 
>>> >>
>>> >> Any idea how to fix this issue ?
>>> >>
>>> >> hive
>>> >> SLF4J: Actual binding is of type 
>>> >> [org.apache.logging.slf4j.Log4jLoggerFactory]
>>> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>>> >>
>>> >> Logging initialized using configuration in
>>> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>>> >> Async: true
>>> >> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> >> org/apache/tez/dag/api/TezConfiguration
>>> >> at 
>>> >> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>>> >> at 
>>> >> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>>> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>>> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >> at 
>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> >> at 
>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> >> at java.lang.reflect.Method.invoke(Method.java:498)
>>> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>>> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>>> >> Caused by: java.lang.ClassNotFoundException:
>>> >> org.apache.tez.dag.api.TezConfiguration
>>> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>>> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>> >>
>>> >>
>>> >> --
>>> >>
>>> >> Thanks
>>> >> Sanjay Gupta
>>>
>>>
>>>
>>> --
>>>
>>> Thanks
>>> Sanjay Gupta



-- 

Thanks
Sanjay Gupta


Re: Docker Hive using tez without hdfs

2024-01-10 Thread Attila Turoczy
Agree with Ayush.

Back to the original issue, is it not related to the latest Tez fix? As I
remember there was an incompatibility issue, which the next tez release
will fix. Maybe this is related to that. Sanjay could you please create a
JIRA around it for the tracking, and the community or someone from the
community will check. (I know most of you don't like the jira but this
could help the tracking then a mail thread)

-Attila

On Wed, Jan 10, 2024 at 8:45 AM Ayush Saxena  wrote:

> Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS
> thing, so, it is a vendor product just being tried to advertised in the
> ‘Apache’ Hive space
>
> So, it can be all mess, filled with security issues or bugs & we Apache
> Hive for the record aren’t responsible for that neither do we endorse usage
> of that or anything outside the scope of Apache
>
> -Ayush
>
> On 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:
>
> 
> As far as I know, Hive-Tez supports local mode, but does not standalone
> mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in
> any type of cluster.
>
> --- Sungwoo
>
> On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:
>
>> I can run hive with mr engine in local mode. Does Hive + Tez also
>> works in standalone mode ?
>>
>> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>> >
>> > Hello,
>> >
>> > I don't have an answer to your problem, but if your goal is to quickly
>> test Hive 3 using Docker, there is an alternative way which uses Hive on
>> MR3.
>> >
>> > https://mr3docs.datamonad.com/docs/quick/docker/
>> >
>> > You can also run Hive on MR3 on Kubernetes.
>> >
>> > Thanks,
>> >
>> > --- Sungwoo
>> >
>> >
>> >
>> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta 
>> wrote:
>> >>
>> >> Hi,
>> >> Using following docker container to run meta , hiveserver2
>> >>
>> >> https://hub.docker.com/r/apache/hive
>> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
>> >>
>> >> I have configured hive-site.xml to se S3
>> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> >> running fine and I can perform queries but setting to tez fails with
>> >> error.
>> >> There is no hdfs but it is running in local mode.
>> >>
>> >> 
>> >> hive.execution.engine
>> >> tez
>> >> 
>> >>
>> >> Any idea how to fix this issue ?
>> >>
>> >> hive
>> >> SLF4J: Actual binding is of type
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>> >>
>> >> Logging initialized using configuration in
>> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> >> Async: true
>> >> Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> org/apache/tez/dag/api/TezConfiguration
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >> at java.lang.reflect.Method.invoke(Method.java:498)
>> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> >> Caused by: java.lang.ClassNotFoundException:
>> >> org.apache.tez.dag.api.TezConfiguration
>> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>> >>
>> >>
>> >> --
>> >>
>> >> Thanks
>> >> Sanjay Gupta
>>
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta
>>
>


Re: Docker Hive using tez without hdfs

2024-01-10 Thread Zoltán Rátkai
Hi,

I am not sure if the official built docker contains TEZ, but if you try to
build it by yourself, you can have a look at here:
https://github.com/apache/hive/blob/master/packaging/src/docker/Dockerfile#L41

To use TEZ you need to place it inside the docker container and configure
it:
Download from here:
https://archive.apache.org/dist/tez/0.10.2/apache-tez-0.10.2-bin.tar.gz

and configure it with:
export TEZ_HOME="/tez"
export TEZ_CONF_DIR="/hive/conf/"
export HADOOP_CLASSPATH="$TEZ_HOME/*:$TEZ_HOME/lib/*:$HADOOP_CLASSPATH"

The above Dockerfile seems to do all that job, so try to build docker image
with with the official Dockerfile, it should work.

Regards,

Zoltan Ratkai

On Wed, Jan 10, 2024 at 8:45 AM Ayush Saxena  wrote:

> Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS
> thing, so, it is a vendor product just being tried to advertised in the
> ‘Apache’ Hive space
>
> So, it can be all mess, filled with security issues or bugs & we Apache
> Hive for the record aren’t responsible for that neither do we endorse usage
> of that or anything outside the scope of Apache
>
> -Ayush
>
> On 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:
>
> 
> As far as I know, Hive-Tez supports local mode, but does not standalone
> mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in
> any type of cluster.
>
> --- Sungwoo
>
> On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:
>
>> I can run hive with mr engine in local mode. Does Hive + Tez also
>> works in standalone mode ?
>>
>> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>> >
>> > Hello,
>> >
>> > I don't have an answer to your problem, but if your goal is to quickly
>> test Hive 3 using Docker, there is an alternative way which uses Hive on
>> MR3.
>> >
>> > https://mr3docs.datamonad.com/docs/quick/docker/
>> >
>> > You can also run Hive on MR3 on Kubernetes.
>> >
>> > Thanks,
>> >
>> > --- Sungwoo
>> >
>> >
>> >
>> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta 
>> wrote:
>> >>
>> >> Hi,
>> >> Using following docker container to run meta , hiveserver2
>> >>
>> >> https://hub.docker.com/r/apache/hive
>> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
>> >>
>> >> I have configured hive-site.xml to se S3
>> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> >> running fine and I can perform queries but setting to tez fails with
>> >> error.
>> >> There is no hdfs but it is running in local mode.
>> >>
>> >> 
>> >> hive.execution.engine
>> >> tez
>> >> 
>> >>
>> >> Any idea how to fix this issue ?
>> >>
>> >> hive
>> >> SLF4J: Actual binding is of type
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>> >>
>> >> Logging initialized using configuration in
>> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> >> Async: true
>> >> Exception in thread "main" java.lang.NoClassDefFoundError:
>> >> org/apache/tez/dag/api/TezConfiguration
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> >> at
>> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >> at java.lang.reflect.Method.invoke(Method.java:498)
>> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> >> Caused by: java.lang.ClassNotFoundException:
>> >> org.apache.tez.dag.api.TezConfiguration
>> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>> >>
>> >>
>> >> --
>> >>
>> >> Thanks
>> >> Sanjay Gupta
>>
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta
>>
>


Re: Docker Hive using tez without hdfs

2024-01-09 Thread Ayush Saxena
Hive on MR3 isn’t an official Apache Hive thing, not even an Apache OS thing, so, it is a vendor product just being tried to advertised in the ‘Apache’ Hive spaceSo, it can be all mess, filled with security issues or bugs & we Apache Hive for the record aren’t responsible for that neither do we endorse usage of that or anything outside the scope of Apache-AyushOn 10-Jan-2024, at 1:09 PM, Sungwoo Park  wrote:As far as I know, Hive-Tez supports local mode, but does not standalone mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in any type of cluster.--- SungwooOn Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:I can run hive with mr engine in local mode. Does Hive + Tez also
works in standalone mode ?

On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>
> Hello,
>
> I don't have an answer to your problem, but if your goal is to quickly test Hive 3 using Docker, there is an alternative way which uses Hive on MR3.
>
> https://mr3docs.datamonad.com/docs/quick/docker/
>
> You can also run Hive on MR3 on Kubernetes.
>
> Thanks,
>
> --- Sungwoo
>
>
>
> On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
>>
>> Hi,
>> Using following docker container to run meta , hiveserver2
>>
>> https://hub.docker.com/r/apache/hive
>> https://github.com/apache/hive/blob/master/packaging/src/docker/
>>
>> I have configured hive-site.xml to se S3
>> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> running fine and I can perform queries but setting to tez fails with
>> error.
>> There is no hdfs but it is running in local mode.
>>
>>     
>>         hive.execution.engine
>>         tez
>>     
>>
>> Any idea how to fix this issue ?
>>
>> hive
>> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
>> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>>
>> Logging initialized using configuration in
>> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> Async: true
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/tez/dag/api/TezConfiguration
>> at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.tez.dag.api.TezConfiguration
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta



-- 

Thanks
Sanjay Gupta



Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sungwoo Park
As far as I know, Hive-Tez supports local mode, but does not standalone
mode (like Spark). Hive-MR3 supports standalone mode, so you can run it in
any type of cluster.

--- Sungwoo

On Wed, Jan 10, 2024 at 4:22 PM Sanjay Gupta  wrote:

> I can run hive with mr engine in local mode. Does Hive + Tez also
> works in standalone mode ?
>
> On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
> >
> > Hello,
> >
> > I don't have an answer to your problem, but if your goal is to quickly
> test Hive 3 using Docker, there is an alternative way which uses Hive on
> MR3.
> >
> > https://mr3docs.datamonad.com/docs/quick/docker/
> >
> > You can also run Hive on MR3 on Kubernetes.
> >
> > Thanks,
> >
> > --- Sungwoo
> >
> >
> >
> > On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
> >>
> >> Hi,
> >> Using following docker container to run meta , hiveserver2
> >>
> >> https://hub.docker.com/r/apache/hive
> >> https://github.com/apache/hive/blob/master/packaging/src/docker/
> >>
> >> I have configured hive-site.xml to se S3
> >> When I set in hive.execution.engine to mr hive-site.xml, hive is
> >> running fine and I can perform queries but setting to tez fails with
> >> error.
> >> There is no hdfs but it is running in local mode.
> >>
> >> 
> >> hive.execution.engine
> >> tez
> >> 
> >>
> >> Any idea how to fix this issue ?
> >>
> >> hive
> >> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> >> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
> >>
> >> Logging initialized using configuration in
> >> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
> >> Async: true
> >> Exception in thread "main" java.lang.NoClassDefFoundError:
> >> org/apache/tez/dag/api/TezConfiguration
> >> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
> >> at
> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
> >> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
> >> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> at java.lang.reflect.Method.invoke(Method.java:498)
> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> >> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> >> Caused by: java.lang.ClassNotFoundException:
> >> org.apache.tez.dag.api.TezConfiguration
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
> >>
> >>
> >> --
> >>
> >> Thanks
> >> Sanjay Gupta
>
>
>
> --
>
> Thanks
> Sanjay Gupta
>


Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sanjay Gupta
I can run hive with mr engine in local mode. Does Hive + Tez also
works in standalone mode ?

On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park  wrote:
>
> Hello,
>
> I don't have an answer to your problem, but if your goal is to quickly test 
> Hive 3 using Docker, there is an alternative way which uses Hive on MR3.
>
> https://mr3docs.datamonad.com/docs/quick/docker/
>
> You can also run Hive on MR3 on Kubernetes.
>
> Thanks,
>
> --- Sungwoo
>
>
>
> On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:
>>
>> Hi,
>> Using following docker container to run meta , hiveserver2
>>
>> https://hub.docker.com/r/apache/hive
>> https://github.com/apache/hive/blob/master/packaging/src/docker/
>>
>> I have configured hive-site.xml to se S3
>> When I set in hive.execution.engine to mr hive-site.xml, hive is
>> running fine and I can perform queries but setting to tez fails with
>> error.
>> There is no hdfs but it is running in local mode.
>>
>> 
>> hive.execution.engine
>> tez
>> 
>>
>> Any idea how to fix this issue ?
>>
>> hive
>> SLF4J: Actual binding is of type 
>> [org.apache.logging.slf4j.Log4jLoggerFactory]
>> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>>
>> Logging initialized using configuration in
>> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
>> Async: true
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/tez/dag/api/TezConfiguration
>> at 
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
>> at 
>> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
>> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
>> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.tez.dag.api.TezConfiguration
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>>
>>
>> --
>>
>> Thanks
>> Sanjay Gupta



-- 

Thanks
Sanjay Gupta


Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sungwoo Park
Hello,

I don't have an answer to your problem, but if your goal is to quickly test
Hive 3 using Docker, there is an alternative way which uses Hive on MR3.

https://mr3docs.datamonad.com/docs/quick/docker/

You can also run Hive on MR3 on Kubernetes.

Thanks,

--- Sungwoo



On Wed, Jan 10, 2024 at 3:25 PM Sanjay Gupta  wrote:

> Hi,
> Using following docker container to run meta , hiveserver2
>
> https://hub.docker.com/r/apache/hive
> https://github.com/apache/hive/blob/master/packaging/src/docker/
>
> I have configured hive-site.xml to se S3
> When I set in hive.execution.engine to mr hive-site.xml, hive is
> running fine and I can perform queries but setting to tez fails with
> error.
> There is no hdfs but it is running in local mode.
>
> 
> hive.execution.engine
> tez
> 
>
> Any idea how to fix this issue ?
>
> hive
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> Hive Session ID = 03368207-1904-4c4c-b63e-b29dd28e0a71
>
> Logging initialized using configuration in
> jar:file:/opt/hive/lib/hive-common-3.1.3.jar!/hive-log4j2.properties
> Async: true
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/tez/dag/api/TezConfiguration
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:661)
> at
> org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.tez.dag.api.TezConfiguration
> at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>
>
> --
>
> Thanks
> Sanjay Gupta
>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-04 Thread Ayush Saxena
Thanx Laszlo,
I faced some issue here [1], maybe if it is not just me, maybe we can
either drop this ticket or maybe reduce the log level to debug

-Ayush

[1] 
https://issues.apache.org/jira/browse/TEZ-4039?focusedCommentId=17800336=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17800336

On Thu, 4 Jan 2024 at 21:10, László Bodor  wrote:
>
> Thanks for the feedback so far, I believe it's time to make the release.
> Please let me know about blockers if any, otherwise, I'm happy to volunteer 
> to start making the release next week.
>
> (Just added user@tez now, as I added user@hive originally, 
> accidentally...that was a feature in this context, not a bug.)
>
> Butao Zhang  ezt írta (időpont: 2024. jan. 2., K, 7:13):
>>
>> +1  (non-binding) Thanks Laszlo !
>>  Replied Message 
>> | From | Attila Turoczy |
>> | Date | 1/2/2024 00:06 |
>> | To |  |
>> | Cc |  |
>> | Subject | Re: [DISCUSS] Tez 0.10.3 Release Planning |
>> +1  (non-binding)
>> Thank you for the effort and happy new year!
>> -Attila
>>
>> On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:
>>
>> +1 (non-binding),
>> Thanx Laszlo for starting the thread.
>>
>> -Ayush
>>
>> On Mon, 1 Jan 2024 at 18:30, László Bodor 
>> wrote:
>>
>> Hi Everyone!
>>
>> Happy New Year!
>>
>> I think it's time to create a new Tez release.
>> It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
>> since 0.10.2, which are:
>> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
>> .
>>
>> Please let me know your opinions.
>>
>> Regards,
>> Laszlo Bodor
>> Tez PMC Chair
>>
>>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-04 Thread László Bodor
Thanks for the feedback so far, I believe it's time to make the release.
Please let me know about blockers if any, otherwise, I'm happy to volunteer
to start making the release next week.

(Just added user@tez now, as I added user@hive originally,
accidentally...that was a feature in this context, not a bug.)

Butao Zhang  ezt írta (időpont: 2024. jan. 2., K,
7:13):

> +1  (non-binding) Thanks Laszlo !
>  Replied Message 
> | From | Attila Turoczy |
> | Date | 1/2/2024 00:06 |
> | To |  |
> | Cc |  |
> | Subject | Re: [DISCUSS] Tez 0.10.3 Release Planning |
> +1  (non-binding)
> Thank you for the effort and happy new year!
> -Attila
>
> On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:
>
> +1 (non-binding),
> Thanx Laszlo for starting the thread.
>
> -Ayush
>
> On Mon, 1 Jan 2024 at 18:30, László Bodor 
> wrote:
>
> Hi Everyone!
>
> Happy New Year!
>
> I think it's time to create a new Tez release.
> It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
> since 0.10.2, which are:
>
> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
> .
>
> Please let me know your opinions.
>
> Regards,
> Laszlo Bodor
> Tez PMC Chair
>
>
>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread Butao Zhang
+1  (non-binding) Thanks Laszlo !
 Replied Message 
| From | Attila Turoczy |
| Date | 1/2/2024 00:06 |
| To |  |
| Cc |  |
| Subject | Re: [DISCUSS] Tez 0.10.3 Release Planning |
+1  (non-binding)
Thank you for the effort and happy new year!
-Attila

On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:

+1 (non-binding),
Thanx Laszlo for starting the thread.

-Ayush

On Mon, 1 Jan 2024 at 18:30, László Bodor 
wrote:

Hi Everyone!

Happy New Year!

I think it's time to create a new Tez release.
It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
since 0.10.2, which are:
https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
.

Please let me know your opinions.

Regards,
Laszlo Bodor
Tez PMC Chair




Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread Attila Turoczy
+1  (non-binding)
Thank you for the effort and happy new year!
-Attila

On Mon, 1 Jan 2024 at 14:22, Ayush Saxena  wrote:

> +1 (non-binding),
> Thanx Laszlo for starting the thread.
>
> -Ayush
>
> On Mon, 1 Jan 2024 at 18:30, László Bodor 
> wrote:
> >
> > Hi Everyone!
> >
> > Happy New Year!
> >
> > I think it's time to create a new Tez release.
> > It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes
> since 0.10.2, which are:
> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3)
> .
> >
> > Please let me know your opinions.
> >
> > Regards,
> > Laszlo Bodor
> > Tez PMC Chair
> >
>


Re: [DISCUSS] Tez 0.10.3 Release Planning

2024-01-01 Thread Ayush Saxena
+1 (non-binding),
Thanx Laszlo for starting the thread.

-Ayush

On Mon, 1 Jan 2024 at 18:30, László Bodor  wrote:
>
> Hi Everyone!
>
> Happy New Year!
>
> I think it's time to create a new Tez release.
> It's not a secret: Hive 4.0 GA would benefit from the latest bug fixes since 
> 0.10.2, which are: 
> https://issues.apache.org/jira/issues?jql=project%20%3D%20TEZ%20AND%20resolution%20!%3D%20null%20and%20fixVersion%20in%20(0.10.3).
>
> Please let me know your opinions.
>
> Regards,
> Laszlo Bodor
> Tez PMC Chair
>


Re: Hive 3.1.3 Hadoop Compatability

2023-12-25 Thread Takanobu Asanuma
BigTop supports a specific version stack with some patches in place. It
should be helpful for you.

- Currently, the master branch consists of Hadoop-3.3.6, Hive-3.1.3,
Tez-0.10.2.
- https://github.com/apache/bigtop/blob/master/bigtop.bom
- Hive patches:
https://github.com/apache/bigtop/tree/master/bigtop-packages/src/common/hive
- Tez patches:
https://github.com/apache/bigtop/tree/master/bigtop-packages/src/common/tez

Disclaimer: I am not the developer of Hive/Tez/BigTop.

Thanks,
- Takanobu

2023年12月22日(金) 23:14 Austin Hackett :

> Many thanks for clarifying Ayush - much appreciated
>
> > On 22 Dec 2023, at 08:41, Ayush Saxena  wrote:
> >
> > Ideally the hadoop should be on 3.1.0 only, that is what we support,
> > rest if there are no incompatibilities it might or might not work with
> > higher versions of hadoop, we at "hive" don't claim that it can work,
> > mostly it will create issues with hadoop-3.3.x line due to thirdparty
> > libs and stuff like that, Guava IIRC does create some mess.
> >
> > So, short answer: we officially only support the above said hadoop
> > versions only for a particular hive release.
> >
> > -Ayush
> >
> >> On Fri, 22 Dec 2023 at 03:03, Austin Hackett 
> wrote:
> >>
> >> Hi Ayush
> >>
> >> Many thanks for your response.
> >>
> >> I’d really appreciate a clarification if that’s OK?
> >>
> >> Does this just mean that the Hadoop 3.1.0 libraries need to be deployed
> with Hive, or does it also mean the Hadoop cluster itself cannot be on a
> version later than 3.1.0 (if using Hive 3.1.3).
> >>
> >> For example, if running the Hive 3.1.3 Metastore in standalone mode,
> can the HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0
> libraries are deployed alongside the HMS?
> >>
> >> Any help is much appreciated
> >>
> >> Thank you
> >>
> >>
> >>
>  On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
> >>>
> >>> Hi Austin,
> >>> Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
> >>>
> >>> HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
> >>>
> >>> The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
> >>>
> >>> -Ayush
> >>>
> >>> On Thu, 21 Dec 2023 at 17:39, Austin Hackett 
> wrote:
> 
>  Hi List
> 
>  I was hoping that someone might be able to clarify which Hadoop
> versions Hive 3.1.3 is compatible with?
> 
>  https://hive.apache.org/general/downloads/ says that Hive release
> 3.1.3 works with Hadoop 3.x.y which is straightforward enough.
> 
>  However, I notice the 4.0.0 releases only work with Hadoop 3.3.1,
> which makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
> 
>  Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive
> 4.0.0, which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and
> not 3.3.1 as mentioned on the releases page.
> 
>  In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which
> Hadoop 3.x.x versions are known to work?
> 
>  Any pointers would be greatly appreciated
> 
>  Thank you
> >>
>


Re: Blog article 'Performance Tuning for Single-table Queries'

2023-12-23 Thread lisoda




 Replied Message 
| From | Sungwoo Park |
| Date | 12/24/2023 00:06 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Blog article 'Performance Tuning for Single-table Queries' |
Hello Hive users,


I have published a new blog article 'Performance Tuning for Single-table 
Queries'. It shows how to change configuration parameters of Hive and Tez in 
order to make simple queries run faster than Spark. Although it uses Hive on 
MR3, the technique equally applies to Hive on Tez and Hive-LLAP.



https://www.datamonad.com/post/2023-12-23-optimize-bi-1.8/


Hope you find it useful.


Cheers,


--- Sungwoo

Re: How to use SKIP_SCHEMA_INIT=TRUE from command line

2023-12-22 Thread Sanjay Gupta
Thanks, it solves issue. Much appreciated.


Thanks
Sanjay Gupta

From: Akshat m 
Sent: Friday, December 22, 2023 5:55:38 AM
To: user@hive.apache.org 
Subject: Re: How to use SKIP_SCHEMA_INIT=TRUE from command line

Hi Sanjay,

Instead of using  --env SKIP_SCHEMA_INIT=TRUE,

Please use --env IS_RESUME="true" while running,

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env IS_RESUME="true" \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash

This should skip the schema initOrUpgrade process: 
https://github.com/apache/hive/blob/5022b85b5f50615f85da07bce42aebd414deb9b0/packaging/src/docker/entrypoint.sh#L24
from the 2nd time you run the container.

Regards,
Akshat


On Fri, Dec 22, 2023 at 11:53 AM Sanjay Gupta 
mailto:sanja...@gmail.com>> wrote:
Hi All,

If my metastore schema already exists with correct version so what I
need to do so it doesn't do init or upgrade when starting metastore
container

I have tried following command line


On MAC environment variables
export HIVE_VERSION=3.1.3
and even
SKIP_SCHEMA_INIT=TRUE

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env SKIP_SCHEMA_INIT=TRUE \
 --env DB_DRIVER=mysql \
 --env 
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash



---

Docker Logs , it still tries to initSchema

docker logs 1c
+ : mysql
+ SKIP_SCHEMA_INIT=false
+ export HIVE_CONF_DIR=/opt/hive/conf
+ HIVE_CONF_DIR=/opt/hive/conf
+ '[' -d '' ']'
+ export 'HADOOP_CLIENT_OPTS= -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ HADOOP_CLIENT_OPTS=' -Xmx1G
-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=hive'
+ [[ false == \f\a\l\s\e ]]
+ initialize_hive
+ COMMAND=-initOrUpgradeSchema
++ cut -d . -f1
++ echo 3.1.3
+ '[' 3 -lt 4 ']'
+ COMMAND=-initSchema
+ /opt/hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:
jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql


Error: Table 'ctlgs' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema
initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
Use --verbose for detailed stacktrace.
*** schemaTool failed ***
[WARN] Failed to create directory: /home/hive/.beeline
No such file or directory
+ '[' 1 -eq 0 ']'
+ echo 'Schema initialization failed!'
Schema initialization failed!
+ exit 1
-



Docker entrypoint.sh have following code

SKIP_SCHEMA_INIT="${IS_RESUME:-false}"

function initialize_hive {
  COMMAND="-initOrUpgradeSchema"
  if [ "$(echo "$HIVE_VER" | cut -d '.' -f1)" -lt "4" ]; then
 COMMAND="-${SCHEMA_COMMAND:-initSchema}"
  fi
  $HIVE_HOME/bin/schematool -dbType $DB_DRIVER $COMMAND
  if [ $? -eq 0 ]; then
echo "Initialized schema successfully.."
  else
echo "Schema initialization failed!"
exit 1
  fi
}

export HIVE_CON

Re: Hive 3.1.3 Hadoop Compatability

2023-12-22 Thread Austin Hackett
Many thanks for clarifying Ayush - much appreciated 

> On 22 Dec 2023, at 08:41, Ayush Saxena  wrote:
> 
> Ideally the hadoop should be on 3.1.0 only, that is what we support,
> rest if there are no incompatibilities it might or might not work with
> higher versions of hadoop, we at "hive" don't claim that it can work,
> mostly it will create issues with hadoop-3.3.x line due to thirdparty
> libs and stuff like that, Guava IIRC does create some mess.
> 
> So, short answer: we officially only support the above said hadoop
> versions only for a particular hive release.
> 
> -Ayush
> 
>> On Fri, 22 Dec 2023 at 03:03, Austin Hackett  wrote:
>> 
>> Hi Ayush
>> 
>> Many thanks for your response.
>> 
>> I’d really appreciate a clarification if that’s OK?
>> 
>> Does this just mean that the Hadoop 3.1.0 libraries need to be deployed with 
>> Hive, or does it also mean the Hadoop cluster itself cannot be on a version 
>> later than 3.1.0 (if using Hive 3.1.3).
>> 
>> For example, if running the Hive 3.1.3 Metastore in standalone mode, can the 
>> HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0 libraries are 
>> deployed alongside the HMS?
>> 
>> Any help is much appreciated
>> 
>> Thank you
>> 
>> 
>> 
 On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
>>> 
>>> Hi Austin,
>>> Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
>>> 
>>> HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
>>> 
>>> The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
>>> 
>>> -Ayush
>>> 
>>> On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
 
 Hi List
 
 I was hoping that someone might be able to clarify which Hadoop versions 
 Hive 3.1.3 is compatible with?
 
 https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 
 works with Hadoop 3.x.y which is straightforward enough.
 
 However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which 
 makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
 
 Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
 which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 
 3.3.1 as mentioned on the releases page.
 
 In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which 
 Hadoop 3.x.x versions are known to work?
 
 Any pointers would be greatly appreciated
 
 Thank you
>> 


Re: How to use SKIP_SCHEMA_INIT=TRUE from command line

2023-12-22 Thread Akshat m
Hi Sanjay,

Instead of using  --env SKIP_SCHEMA_INIT=TRUE,

Please use --env IS_RESUME="true" while running,

docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
--add-host=host.docker.internal:host-gateway \
 --env IS_RESUME="true" \
 --env DB_DRIVER=mysql \
 --env
SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
 
-Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
-Djavax.jdo.option.ConnectionUserName=hive
-Djavax.jdo.option.ConnectionPassword=" \
 --mount source=warehouse,target=/opt/hive/data/warehouse \
 --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash

This should skip the schema initOrUpgrade process:
https://github.com/apache/hive/blob/5022b85b5f50615f85da07bce42aebd414deb9b0/packaging/src/docker/entrypoint.sh#L24
from the 2nd time you run the container.

Regards,
Akshat


On Fri, Dec 22, 2023 at 11:53 AM Sanjay Gupta  wrote:

> Hi All,
>
> If my metastore schema already exists with correct version so what I
> need to do so it doesn't do init or upgrade when starting metastore
> container
>
> I have tried following command line
>
>
> On MAC environment variables
> export HIVE_VERSION=3.1.3
> and even
> SKIP_SCHEMA_INIT=TRUE
>
> docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
>  --env SKIP_SCHEMA_INIT=TRUE \
>  --env DB_DRIVER=mysql \
>  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>
>  
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=" \
>  --mount source=warehouse,target=/opt/hive/data/warehouse \
>  --name metastore-standalone apache/hive:${HIVE_VERSION} /bin/bash
>
>
>
> ---
>
> Docker Logs , it still tries to initSchema
>
> docker logs 1c
> + : mysql
> + SKIP_SCHEMA_INIT=false
> + export HIVE_CONF_DIR=/opt/hive/conf
> + HIVE_CONF_DIR=/opt/hive/conf
> + '[' -d '' ']'
> + export 'HADOOP_CLIENT_OPTS= -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> + HADOOP_CLIENT_OPTS=' -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> + [[ false == \f\a\l\s\e ]]
> + initialize_hive
> + COMMAND=-initOrUpgradeSchema
> ++ cut -d . -f1
> ++ echo 3.1.3
> + '[' 3 -lt 4 ']'
> + COMMAND=-initSchema
> + /opt/hive/bin/schematool -dbType mysql -initSchema
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> Metastore connection URL:
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> Metastore connection User: hive
> Starting metastore schema initialization to 3.1.0
> Initialization script hive-schema-3.1.0.mysql.sql
>
>
> Error: Table 'ctlgs' already exists (state=42S01,code=1050)
> org.apache.hadoop.hive.metastore.HiveMetaException: Schema
> initialization FAILED! Metastore state would be inconsistent !!
> Underlying cause: java.io.IOException : Schema script failed, errorcode 2
> Use --verbose for detailed stacktrace.
> *** schemaTool failed ***
> [WARN] Failed to create directory: /home/hive/.beeline
> No such file or directory
> + '[' 1 -eq 0 ']'
> + echo 'Schema initialization failed!'
> Schema initialization failed!
> + exit 1
> -
>
>
>
> Docker entrypoint.sh have following code
>
> SKIP_SCHEMA_INIT="${IS_RESUME:-false}"
>
> function initialize_hive {
>   COMMAND="-initOrUpgradeSchema"
>   if [ "$(echo "$HIVE_VER" | cut -d '.' -f1)" -lt "4" ]; then
>  COMMAND="-${SCHEMA_COMMAND:-initSchema}"
>   fi
>   $HIVE_HOME/bin/schematool -dbType $DB_DRIVER $COMMAND
>   if [ $? -eq 0 ]; then
> echo "Initialized schema successfully.."
>   else
> echo "Schema initialization failed!"
> exit 1
>   fi
> }
>
> export HIVE_CONF_DIR=$HIVE_HOME/conf
> if [ -d "${HIVE_CUSTOM_CONF_DIR:-}" ]; then
>   find "${HIVE_CUSTOM_CONF_DIR}" -type f -exec \
> ln -sfn {} "${HIVE_CONF_DIR}"/ \;
>   export HADOOP_CONF_DIR=$HIVE_CONF_DIR

Re: Hive 3.1.3 Hadoop Compatability

2023-12-22 Thread Ayush Saxena
Ideally the hadoop should be on 3.1.0 only, that is what we support,
rest if there are no incompatibilities it might or might not work with
higher versions of hadoop, we at "hive" don't claim that it can work,
mostly it will create issues with hadoop-3.3.x line due to thirdparty
libs and stuff like that, Guava IIRC does create some mess.

So, short answer: we officially only support the above said hadoop
versions only for a particular hive release.

-Ayush

On Fri, 22 Dec 2023 at 03:03, Austin Hackett  wrote:
>
> Hi Ayush
>
> Many thanks for your response.
>
> I’d really appreciate a clarification if that’s OK?
>
> Does this just mean that the Hadoop 3.1.0 libraries need to be deployed with 
> Hive, or does it also mean the Hadoop cluster itself cannot be on a version 
> later than 3.1.0 (if using Hive 3.1.3).
>
> For example, if running the Hive 3.1.3 Metastore in standalone mode, can the 
> HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0 libraries are 
> deployed alongside the HMS?
>
> Any help is much appreciated
>
> Thank you
>
>
>
> > On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
> >
> > Hi Austin,
> > Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
> >
> > HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
> >
> > The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
> >
> > -Ayush
> >
> > On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
> >>
> >> Hi List
> >>
> >> I was hoping that someone might be able to clarify which Hadoop versions 
> >> Hive 3.1.3 is compatible with?
> >>
> >> https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 
> >> works with Hadoop 3.x.y which is straightforward enough.
> >>
> >> However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which 
> >> makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
> >>
> >> Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
> >> which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 
> >> 3.3.1 as mentioned on the releases page.
> >>
> >> In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which 
> >> Hadoop 3.x.x versions are known to work?
> >>
> >> Any pointers would be greatly appreciated
> >>
> >> Thank you
>


Re: Hive 3.1.3 Hadoop Compatability

2023-12-21 Thread Austin Hackett
Hi Ayush

Many thanks for your response.

I’d really appreciate a clarification if that’s OK?

Does this just mean that the Hadoop 3.1.0 libraries need to be deployed with 
Hive, or does it also mean the Hadoop cluster itself cannot be on a version 
later than 3.1.0 (if using Hive 3.1.3).

For example, if running the Hive 3.1.3 Metastore in standalone mode, can the 
HMS work with a 3.3.6 HDFS cluster providing the Hadoop 3.1.0 libraries are 
deployed alongside the HMS?

Any help is much appreciated

Thank you



> On 21 Dec 2023, at 12:18, Ayush Saxena  wrote:
> 
> Hi Austin,
> Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0
> 
> HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1
> 
> The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6
> 
> -Ayush
> 
> On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
>> 
>> Hi List
>> 
>> I was hoping that someone might be able to clarify which Hadoop versions 
>> Hive 3.1.3 is compatible with?
>> 
>> https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 
>> works with Hadoop 3.x.y which is straightforward enough.
>> 
>> However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which 
>> makes we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
>> 
>> Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
>> which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 3.3.1 
>> as mentioned on the releases page.
>> 
>> In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which Hadoop 
>> 3.x.x versions are known to work?
>> 
>> Any pointers would be greatly appreciated
>> 
>> Thank you



Re: Hive 3.1.3 Hadoop Compatability

2023-12-21 Thread Ayush Saxena
Hi Austin,
Hive 3.1.3 & 4.0.0-alpha-1 works with Hadoop-3.1.0

HIve 4.0.0-alpha-2 & 4.0.0-beta-1 works with Hadoop-3.3.1

The upcoming Hive 4.0 GA release would be compatible with Hadoop-3.3.6

-Ayush

On Thu, 21 Dec 2023 at 17:39, Austin Hackett  wrote:
>
> Hi List
>
> I was hoping that someone might be able to clarify which Hadoop versions Hive 
> 3.1.3 is compatible with?
>
> https://hive.apache.org/general/downloads/ says that Hive release 3.1.3 works 
> with Hadoop 3.x.y which is straightforward enough.
>
> However, I notice the 4.0.0 releases only work with Hadoop 3.3.1, which makes 
> we wonder if 3.1.3 doesn’t work actually work with 3.3.1.
>
> Similarly, I see that HIVE-27757 upgrades Hadoop to 3.3.6 in Hive 4.0.0, 
> which makes me wonder if Hive 4.0.0 actually works with 3.3.6 and not 3.3.1 
> as mentioned on the releases page.
>
> In summary: does Hive 3.1.3 work with Hadoop 3.3.6, and if not, which Hadoop 
> 3.x.x versions are known to work?
>
> Any pointers would be greatly appreciated
>
> Thank you


Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-20 Thread Akshat m
Hi,

You can download it from here:
https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.29/mysql-connector-java-8.0.29.jar

Regards,
Akshat

On Thu, Dec 21, 2023 at 1:40 AM Sanjay Gupta  wrote:

> There is no mysql driver file in following repo
> https://repo1.maven.org/maven2/org
>
> On Mon, Dec 18, 2023 at 3:10 AM Simhadri G  wrote:
> >
> > We can modify the Dockerfile to wget the necessary driver and copy it to
> /opt/hive/lib/ .  This should make it work. The diff is attached below:
> >
> >
> > diff --git a/packaging/src/docker/Dockerfile
> b/packaging/src/docker/Dockerfile
> > --- a/packaging/src/docker/Dockerfile (revision
> dceaf810b32fc266e3e657fdaefcd4507f2191b5)
> > +++ b/packaging/src/docker/Dockerfile (date 1702897518609)
> > @@ -80,6 +80,9 @@
> >
> >  ENV PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
> >
> > +RUN wget
> https://repo1.maven.org/maven2/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
> > +RUN cp /postgresql-42.5.1.jar /opt/hive/lib/
> > +
> >  COPY entrypoint.sh /
> >  COPY conf $HIVE_HOME/conf
> >  RUN chmod +x /entrypoint.sh
> >
> > On Mon, Dec 18, 2023, 12:59 PM Ayush Saxena  wrote:
> >>
> >> I think the similar problem is being chased as part of
> >> https://github.com/apache/hive/pull/4948
> >>
> >> On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
> >> >
> >> >
> >> >
> >> >
> >> > Issue with Docker container using mysql RDBMS ( Failed to load driver)
> >> >
> >> > https://hub.docker.com/r/apache/hive
> >> >
> >> > According to readme
> >> >
> >> > Launch Standalone Metastore With External RDBMS
> (Postgres/Oracle/MySql/MsSql)
> >> >
> >> > I want to use MySQL
> >> >
> >> > I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
> >> >
> >> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >> >  --env DB_DRIVER=mysql \
> >> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >> >
> >> >
> >> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >> >  --env DB_DRIVER=mysql \
> >> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >> >
> >> > Docker logs shows this for both drivers ( same error )
> >> >
> >> > docker logs f3
> >> > + : mysql
> >> > + SKIP_SCHEMA_INIT=false
> >> > + export HIVE_CONF_DIR=/opt/hive/conf
> >> > + HIVE_CONF_DIR=/opt/hive/conf
> >> > + '[' -d '' ']'
> >> > + export 'HADOOP_CLIENT_OPTS= -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> >> > + HADOOP_CLIENT_OPTS=' -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> >> > + [[ false == \f\a\l\s\e ]]
> >> > + initialize_hive
> >> > + /opt/hive/bin/schematool -dbType mysql -initSchema
> >> > SLF4J: Class path contains multiple SLF4J bindings.
> >> > SLF4J: Found binding in
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> > SLF4J: Found binding in
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> >> > SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> >> > Metastore connection URL:
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> >> > Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> >> > Metastore connection User: hive
> >> > org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load
> driver
> >> > Underlying cause: java.lang.ClassNotFoundException :
> com.mysql.cj.jdbc.Driver
> >> > Use --verbose for detailed stacktrace.
> >> > *** schemaTool failed ***
> >> > + '[' 1 -eq 0 ']'
> >> > + echo 'Schema initialization failed!'
> >> > Schema initialization failed!
> >> > + exit 1

Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-20 Thread Sanjay Gupta
There is no mysql driver file in following repo
https://repo1.maven.org/maven2/org

On Mon, Dec 18, 2023 at 3:10 AM Simhadri G  wrote:
>
> We can modify the Dockerfile to wget the necessary driver and copy it to 
> /opt/hive/lib/ .  This should make it work. The diff is attached below:
>
>
> diff --git a/packaging/src/docker/Dockerfile b/packaging/src/docker/Dockerfile
> --- a/packaging/src/docker/Dockerfile (revision 
> dceaf810b32fc266e3e657fdaefcd4507f2191b5)
> +++ b/packaging/src/docker/Dockerfile (date 1702897518609)
> @@ -80,6 +80,9 @@
>
>  ENV PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
>
> +RUN wget 
> https://repo1.maven.org/maven2/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
> +RUN cp /postgresql-42.5.1.jar /opt/hive/lib/
> +
>  COPY entrypoint.sh /
>  COPY conf $HIVE_HOME/conf
>  RUN chmod +x /entrypoint.sh
>
> On Mon, Dec 18, 2023, 12:59 PM Ayush Saxena  wrote:
>>
>> I think the similar problem is being chased as part of
>> https://github.com/apache/hive/pull/4948
>>
>> On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
>> >
>> >
>> >
>> >
>> > Issue with Docker container using mysql RDBMS ( Failed to load driver)
>> >
>> > https://hub.docker.com/r/apache/hive
>> >
>> > According to readme
>> >
>> > Launch Standalone Metastore With External RDBMS 
>> > (Postgres/Oracle/MySql/MsSql)
>> >
>> > I want to use MySQL
>> >
>> > I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
>> >
>> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
>> > --add-host=host.docker.internal:host-gateway \
>> >  --env DB_DRIVER=mysql \
>> >  --env 
>> > SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
>> >  
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=password" \
>> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
>> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
>> >
>> >
>> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
>> > --add-host=host.docker.internal:host-gateway \
>> >  --env DB_DRIVER=mysql \
>> >  --env 
>> > SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>> >   
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=password" \
>> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
>> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
>> >
>> > Docker logs shows this for both drivers ( same error )
>> >
>> > docker logs f3
>> > + : mysql
>> > + SKIP_SCHEMA_INIT=false
>> > + export HIVE_CONF_DIR=/opt/hive/conf
>> > + HIVE_CONF_DIR=/opt/hive/conf
>> > + '[' -d '' ']'
>> > + export 'HADOOP_CLIENT_OPTS= -Xmx1G 
>> > -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=hive'
>> > + HADOOP_CLIENT_OPTS=' -Xmx1G 
>> > -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
>> > -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> >  -Djavax.jdo.option.ConnectionUserName=hive 
>> > -Djavax.jdo.option.ConnectionPassword=hive'
>> > + [[ false == \f\a\l\s\e ]]
>> > + initialize_hive
>> > + /opt/hive/bin/schematool -dbType mysql -initSchema
>> > SLF4J: Class path contains multiple SLF4J bindings.
>> > SLF4J: Found binding in 
>> > [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: Found binding in 
>> > [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>> > explanation.
>> > SLF4J: Actual binding is of type 
>> > [org.apache.logging.slf4j.Log4jLoggerFactory]
>> > Metastore connection URL: 
>> > jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>> > Metastore Connection Driver : com.mysql.cj.jdbc.Driver
>> > Metastore connection User: hive
>> > org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
>> > Underlying cause: java.lang.ClassNotFoundException : 
>> > com.mysql.cj.jdbc.Driver
>> > Use --verbose for detailed stacktrace.
>> > *** schemaTool failed ***
>> > + '[' 1 -eq 0 ']'
>> > + echo 'Schema initialization failed!'
>> > Schema initialization failed!
>> > + exit 1
>> >
>> > Any idea, why I am getting failed to load driver for MySQL DB.
>> >
>> > Isn't docker container comes with MySQL Driver ?
>> >
>> > Docker container exits so I can't check whether driver is already 
>> > installed.
>> >
>> > Let me know, what I can do 

Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-18 Thread Simhadri G
We can modify the Dockerfile to wget the necessary driver and copy it to
/opt/hive/lib/ .  This should make it work. The diff is attached below:


diff --git a/packaging/src/docker/Dockerfile
b/packaging/src/docker/Dockerfile
--- a/packaging/src/docker/Dockerfile (revision
dceaf810b32fc266e3e657fdaefcd4507f2191b5)
+++ b/packaging/src/docker/Dockerfile (date 1702897518609)
@@ -80,6 +80,9 @@

 ENV PATH=$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH

+RUN wget
https://repo1.maven.org/maven2/org/postgresql/postgresql/42.5.1/postgresql-42.5.1.jar
+RUN cp /postgresql-42.5.1.jar /opt/hive/lib/
+
 COPY entrypoint.sh /
 COPY conf $HIVE_HOME/conf
 RUN chmod +x /entrypoint.sh

On Mon, Dec 18, 2023, 12:59 PM Ayush Saxena  wrote:

> I think the similar problem is being chased as part of
> https://github.com/apache/hive/pull/4948
>
> On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
> >
> >
> >
> >
> > Issue with Docker container using mysql RDBMS ( Failed to load driver)
> >
> > https://hub.docker.com/r/apache/hive
> >
> > According to readme
> >
> > Launch Standalone Metastore With External RDBMS
> (Postgres/Oracle/MySql/MsSql)
> >
> > I want to use MySQL
> >
> > I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
> >
> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >  --env DB_DRIVER=mysql \
> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >
> >
> > docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore
> --add-host=host.docker.internal:host-gateway \
> >  --env DB_DRIVER=mysql \
> >  --env
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=password" \
> >  --mount source=warehouse,target=/opt/hive/data/warehouse \
> >  --name metastore-standalone apache/hive:${HIVE_VERSION}
> >
> > Docker logs shows this for both drivers ( same error )
> >
> > docker logs f3
> > + : mysql
> > + SKIP_SCHEMA_INIT=false
> > + export HIVE_CONF_DIR=/opt/hive/conf
> > + HIVE_CONF_DIR=/opt/hive/conf
> > + '[' -d '' ']'
> > + export 'HADOOP_CLIENT_OPTS= -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> > + HADOOP_CLIENT_OPTS=' -Xmx1G
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> -Djavax.jdo.option.ConnectionUserName=hive
> -Djavax.jdo.option.ConnectionPassword=hive'
> > + [[ false == \f\a\l\s\e ]]
> > + initialize_hive
> > + /opt/hive/bin/schematool -dbType mysql -initSchema
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> > SLF4J: Actual binding is of type
> [org.apache.logging.slf4j.Log4jLoggerFactory]
> > Metastore connection URL:
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> > Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> > Metastore connection User: hive
> > org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
> > Underlying cause: java.lang.ClassNotFoundException :
> com.mysql.cj.jdbc.Driver
> > Use --verbose for detailed stacktrace.
> > *** schemaTool failed ***
> > + '[' 1 -eq 0 ']'
> > + echo 'Schema initialization failed!'
> > Schema initialization failed!
> > + exit 1
> >
> > Any idea, why I am getting failed to load driver for MySQL DB.
> >
> > Isn't docker container comes with MySQL Driver ?
> >
> > Docker container exits so I can't check whether driver is already
> installed.
> >
> > Let me know, what I can do to make it work.
> >
> > --
> >
> >
> > Thanks
> > Sanjay Gupta
> >
> >
> >
> > --
> >
> > Thanks
> > Sanjay Gupta
> >
> >
> >
> > --
> >
> > Thanks
> > Sanjay Gupta
> >
>


Re: Help with Docker Apache/Hive metastore using mysql remote database

2023-12-17 Thread Ayush Saxena
I think the similar problem is being chased as part of
https://github.com/apache/hive/pull/4948

On Mon, 18 Dec 2023 at 09:48, Sanjay Gupta  wrote:
>
>
>
>
> Issue with Docker container using mysql RDBMS ( Failed to load driver)
>
> https://hub.docker.com/r/apache/hive
>
> According to readme
>
> Launch Standalone Metastore With External RDBMS (Postgres/Oracle/MySql/MsSql)
>
> I want to use MySQL
>
> I tried com.mysql.jdbc.Driver or com.mysql.cj.jdbc.Driver
>
> docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
> --add-host=host.docker.internal:host-gateway \
>  --env DB_DRIVER=mysql \
>  --env 
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver 
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=password" \
>  --mount source=warehouse,target=/opt/hive/data/warehouse \
>  --name metastore-standalone apache/hive:${HIVE_VERSION}
>
>
> docker run -it -d -p 9083:9083 --env SERVICE_NAME=metastore 
> --add-host=host.docker.internal:host-gateway \
>  --env DB_DRIVER=mysql \
>  --env 
> SERVICE_OPTS="-Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver
>   
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=password" \
>  --mount source=warehouse,target=/opt/hive/data/warehouse \
>  --name metastore-standalone apache/hive:${HIVE_VERSION}
>
> Docker logs shows this for both drivers ( same error )
>
> docker logs f3
> + : mysql
> + SKIP_SCHEMA_INIT=false
> + export HIVE_CONF_DIR=/opt/hive/conf
> + HIVE_CONF_DIR=/opt/hive/conf
> + '[' -d '' ']'
> + export 'HADOOP_CLIENT_OPTS= -Xmx1G 
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=hive'
> + HADOOP_CLIENT_OPTS=' -Xmx1G 
> -Djavax.jdo.option.ConnectionDriverName=com.mysql.cj.jdbc.Driver  
> -Djavax.jdo.option.ConnectionURL=jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
>  -Djavax.jdo.option.ConnectionUserName=hive 
> -Djavax.jdo.option.ConnectionPassword=hive'
> + [[ false == \f\a\l\s\e ]]
> + initialize_hive
> + /opt/hive/bin/schematool -dbType mysql -initSchema
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Metastore connection URL: 
> jdbc:mysql://host.docker.internal:3306/hive?createDatabaseIfNotExist=true
> Metastore Connection Driver : com.mysql.cj.jdbc.Driver
> Metastore connection User: hive
> org.apache.hadoop.hive.metastore.HiveMetaException: Failed to load driver
> Underlying cause: java.lang.ClassNotFoundException : com.mysql.cj.jdbc.Driver
> Use --verbose for detailed stacktrace.
> *** schemaTool failed ***
> + '[' 1 -eq 0 ']'
> + echo 'Schema initialization failed!'
> Schema initialization failed!
> + exit 1
>
> Any idea, why I am getting failed to load driver for MySQL DB.
>
> Isn't docker container comes with MySQL Driver ?
>
> Docker container exits so I can't check whether driver is already installed.
>
> Let me know, what I can do to make it work.
>
> --
>
>
> Thanks
> Sanjay Gupta
>
>
>
> --
>
> Thanks
> Sanjay Gupta
>
>
>
> --
>
> Thanks
> Sanjay Gupta
>


Re: MR3 1.8 released

2023-12-15 Thread Sungwoo Park
For Chinese users, MR3 1.8 is now shipped in HiDataPlus (along with
Celeborn).

https://mp.weixin.qq.com/s/65bgrnFpXtORlb4FjlPMWA

--- Sungwoo

On Sat, Dec 9, 2023 at 9:08 PM Sungwoo Park  wrote:

> MR3 1.8 released
>
> On behalf of the MR3 team, I am pleased to announce the release of MR3 1.8.
>
> MR3 is an execution engine similar in spirit to MapReduce and Tez which
> has been under development since 2015. Its main application is Hive on MR3.
> You can run Hive on MR3 on Hadoop, on Kubernetes, in standalone mode (which
> does not require Hadoop/Kubernetes), or on a local machine. You can also
> test Hive on MR3 in a single Docker container.
>
> From MR3 1.8, we assume Java 17 by default. For running Hive on MR3 on
> Hadoop, we continue to support Java 8 as well. For Kubernetes and
> standalone mode, we release Hive on MR3 built with Java 17 only.
>
> Please see the release notes for changes new in MR3 1.8. A major new
> feature is that Hive on MR3 can use Apache Celeborn for remote shuffle
> service.
>
> https://mr3docs.datamonad.com/docs/release/
>
> For the performance of Hive on MR3 1.8, please see a blog article "Hive on
> MR3 - from Java 8 to Java 17 (and beating Trino)". On the 10TB TPC-DS
> benchmark, Hive on MR3 1.8 finishes all the queries faster than Trino 418.
>
> https://www.datamonad.com/post/2023-12-09-hivemr3-java17-1.8/
>
> Thank you,
>
> --- Sungwoo
>


Re: hive can not read iceberg-parquet table

2023-11-21 Thread Butao Zhang
Hi lisoda,


Thank you for trying the Hive4-beta and reporting this issue. Based 
on the current information you provided, i can not reproduce this issue. Could 
you please give more clues? e.g.
1) Which Tez version are you using? Hive4-beta uses Tez 0.10.2 by 
default.
2) Can we reproduce this issue with small data or just insert 
several rows? Does the iceberg data have delete files?
3) Does this problem only happen with parquet data? What about orc?
4) If you turn off the vectorized execution set 
hive.vectorized.execution.enabled=false; will the query succeed?


   BTW, it is better to create a ticket in 
https://issues.apache.org/jira/projects/HIVE/issues, and describe your problem 
as well as a reproducible steps.





Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/22/2023 10:48 |
| To | user@hive.apache.org |
| Subject | hive can not read iceberg-parquet table |
Hi team.

I am currently testing HIVE-4.0.0-BETA.
For better read performance, we use the Iceberg-Parquet table.
However, we have found that HIVE is currently unable to handle iceberg-parquet 
tables correctly.


Example:


CREATE EXTERNAL TABLE iceberg_dwd.b_qqd_shop_rfm_parquet_snappy
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 
'hdfs://xxx/iceberg-catalog/warehouse/test/b_qqd_shop_rfm_parquet_snappy/'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');


set hive.default.fileformat=orc;
set hive.default.fileformat.managed=orc;
create table test_parquet_as_orc as select * from b_qqd_shop_rfm_parquet_snappy 
limit 100;






, TaskAttempt 2 failed, info=[Error: Node: /xxx..xx.xx: Error while 
running task ( failure ) : 
attempt_1696729618575_69586_1_00_00_2:java.lang.RuntimeException: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:110)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:83)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:993)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
... 19 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.vector.reducesink.VectorReduceSinkEmptyKeyOperator.process(VectorReduceSinkEmptyKeyOperator.java:137)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:919)

Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Sai Hemanth Gantasala
Congratulations Butao Zhang, Very well deserved. Your contributions to the
Data connector feature are very impressive and much appreciated. Looking
forward to much more!!

Thanks,
Sai.

On Tue, Nov 21, 2023 at 1:06 PM Stamatis Zampetakis 
wrote:

> Congratulations Butao, well deserved! Very glad to see another Iceberg
> expert joining the team.
>
> Best,
> Stamatis
>
>
> On Tue, Nov 21, 2023, 4:47 PM Butao Zhang  wrote:
>
>> Thank you to the Hive community for this honor. I will continue to
>> contribute to the community with my efforts.
>> Thanks all!
>>
>>
>> Thanks,
>> Butao Zhang
>>  Replied Message 
>> | From | Ayush Saxena |
>> | Date | 11/21/2023 15:02 |
>> | To | dev ,
>>  ,
>> Butao Zhang |
>> | Subject | [ANNOUNCE] New committer: Butao Zhang (zhangbutao) |
>> Hi All,
>> Apache Hive's Project Management Committee (PMC) has invited Butao
>> Zhang  to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>> Butao Zhang welcome, thank you for your contributions, and we look
>> forward to your further interactions with the community!
>>
>> Ayush Saxena
>> (On behalf of Apache Hive PMC)
>>
>


Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Stamatis Zampetakis
Congratulations Butao, well deserved! Very glad to see another Iceberg
expert joining the team.

Best,
Stamatis


On Tue, Nov 21, 2023, 4:47 PM Butao Zhang  wrote:

> Thank you to the Hive community for this honor. I will continue to
> contribute to the community with my efforts.
> Thanks all!
>
>
> Thanks,
> Butao Zhang
>  Replied Message 
> | From | Ayush Saxena |
> | Date | 11/21/2023 15:02 |
> | To | dev ,
>  ,
> Butao Zhang |
> | Subject | [ANNOUNCE] New committer: Butao Zhang (zhangbutao) |
> Hi All,
> Apache Hive's Project Management Committee (PMC) has invited Butao
> Zhang  to become a committer, and we are pleased to announce that he
> has accepted.
>
> Butao Zhang welcome, thank you for your contributions, and we look
> forward to your further interactions with the community!
>
> Ayush Saxena
> (On behalf of Apache Hive PMC)
>


Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Butao Zhang
Thank you to the Hive community for this honor. I will continue to contribute 
to the community with my efforts. 
Thanks all!


Thanks,
Butao Zhang
 Replied Message 
| From | Ayush Saxena |
| Date | 11/21/2023 15:02 |
| To | dev ,
 ,
Butao Zhang |
| Subject | [ANNOUNCE] New committer: Butao Zhang (zhangbutao) |
Hi All,
Apache Hive's Project Management Committee (PMC) has invited Butao
Zhang  to become a committer, and we are pleased to announce that he
has accepted.

Butao Zhang welcome, thank you for your contributions, and we look
forward to your further interactions with the community!

Ayush Saxena
(On behalf of Apache Hive PMC)


Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-17 Thread Krisztian Kasa
Hi Eugene,

Hive has a feature called automatic query rewrite [1]. This feature needs
up-to-date information about the materialized views available. [2]
The feature can be disable by the
setting: hive.materializedview.rewriting [3]

Hope this helps.

regards,
Krisztian

[1]
https://cwiki.apache.org/confluence/display/Hive/Materialized+views#Materializedviews-Materializedview-basedqueryrewriting
[2]
https://github.com/apache/hive/blob/af7059e2bdc8b18af42e0b7f7163b923a0bfd424/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L2091-L2094
[3]
https://github.com/apache/hive/blob/af7059e2bdc8b18af42e0b7f7163b923a0bfd424/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1605C51-L1606

On Fri, Nov 17, 2023 at 2:06 AM Eugene Miretsky  wrote:

> Hey!
>
> Hive version is 3.1.3
>
> On Wed, Nov 15, 2023 at 9:23 PM Butao Zhang  wrote:
>
>> Hi, which version of hms are you using now? I have checked the master
>> branch  and beta-1 branch source code, but I can't find the place where
>> this method *get_materialized_views_for_rewriting*  is called by mistake.
>>
>> Thanks,
>>
>> Butao Zhang
>>  Replied Message 
>> From Eugene Miretsky 
>> Date 11/16/2023 02:21
>> To  
>> Subject Slow Hive query with a lot of
>> 'get_materialized_views_for_rewriting'
>> Hey!
>>
>> We have a catalog with fairly a lot of databases and tables.
>>
>> Where we do a simple query (select * from table limit 5;) on an ideal
>> cluster, it takes around 20seconds, sometimes longer (usually first run
>> takes 40s+)
>>
>> Looking at the hive-metastore logs during most of the query time the logs
>> show "metastore.Hivemetastore: 13:  get_materialized_views_for_rewriting:
>> db = " for each database. When these calls finish, the query
>> executes pretty quickly.
>>
>> My interpretation of this is that most of the time is spent on analyzing
>> the metastore and building a query plan, perhaps some sort of Metastore
>> in-memory cache is building built  (but it happens on every call). But I am
>> not really sure how to debug it further, nor could find in the code where
>> this is happening.
>>
>> Any advice on what's causing it or how to troubleshoot?
>>
>> p.s
>> The metadata (i.e original tables and databases) is actually coming from
>> a very old Hive version (1.1), and we migrated it to the newest version of
>> the Metastore using the upgrade tool. In the original Hive version
>> materialized views were not even supported.
>>
>> Cheers,
>> Eugene
>>
>>
>>
>
> --
>


Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread Butao Zhang
Sorry, I'm not sure of the final released time, but I think it will be soon. :) 
 Maybe some other folks of Hive community know more about the GA release.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/17/2023 12:31 |
| To | user |
| Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
May I ask when hive4 can be released?



 Replied Message 
| From | Butao Zhang |
| Date | 11/17/2023 12:24 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Thanks for the info. I checked Hive3.1.3, and there will have performance 
issues when HS2 invoking method get_materialized_views_for_rewritin. You can 
refer to this ticket https://issues.apache.org/jira/browse/HIVE-21631 which was 
fixed in Hive4.


And if you do not need mv ability, here is a workaroud to fix this issue is 
that you can turn off mv rewriting by setting 
hive.materializedview.rewriting=false in hive-site.xml. Or you can port the 
change HIVE-21631 into your Hive3.1.3 source code.


BTW, you can also try Hive4 version to enjoy more interesting features, 
including Apache Iceberg integration, enhanced materialized view, etc.

Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/17/2023 09:06 |
| To |  |
| Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 



Hive version is 3.1.3


On Wed, Nov 15, 2023 at 9:23 PM Butao Zhang  wrote:

Hi, which version of hms are you using now? I have checked the master branch  
and beta-1 branch source code, but I can't find the place where this method 
get_materialized_views_for_rewriting  is called by mistake.



Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/16/2023 02:21 |
| To |  |
| Subject | Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 


We have a catalog with fairly a lot of databases and tables. 


Where we do a simple query (select * from table limit 5;) on an ideal cluster, 
it takes around 20seconds, sometimes longer (usually first run takes 40s+)


Looking at the hive-metastore logs during most of the query time the logs show 
"metastore.Hivemetastore: 13:  get_materialized_views_for_rewriting: db = 
" for each database. When these calls finish, the query executes 
pretty quickly. 


My interpretation of this is that most of the time is spent on analyzing the 
metastore and building a query plan, perhaps some sort of Metastore  in-memory 
cache is building built  (but it happens on every call). But I am not really 
sure how to debug it further, nor could find in the code where this is 
happening. 


Any advice on what's causing it or how to troubleshoot?


p.s
The metadata (i.e original tables and databases) is actually coming from a very 
old Hive version (1.1), and we migrated it to the newest version of the 
Metastore using the upgrade tool. In the original Hive version  materialized 
views were not even supported.  


Cheers,
Eugene





|
|
|
| |




--

|
| |

Re: Question on Hive Metastore catalog support

2023-11-16 Thread Butao Zhang
Hi, maybe you can check this ticket 
https://issues.apache.org/jira/browse/HIVE-26227



Thanks,

Butao Zhang

 Replied Message 
| From | Flavio Junqueira |
| Date | 11/15/2023 17:26 |
| To |  |
| Subject | Question on Hive Metastore catalog support |
Hello there,

I'm interested in understanding the Hive Metastore catalog support. I see 
references in the metastore code to catalogs, for example:

https://github.com/apache/hive/blob/17525f169b9a08cd715bfb42899e45b7c689c77a/standalone-metastore/metastore-common/src/main/protobuf/org/apache/hadoop/hive/metastore/hive_metastore.proto#L45

But I don't see create/drop catalog statements in the HiveQL DDL, neither in 
the documentation nor in the code:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
https://github.com/apache/hive/tree/17525f169b9a08cd715bfb42899e45b7c689c77a/ql/src/java/org/apache/hadoop/hive/ql/ddl

Could anyone clarify the state of support for catalogs in the metastore? Are 
there known applications leveraging this catalog support?

Thanks,
-Flavio

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread lisoda
May I ask when hive4 can be released?



 Replied Message 
| From | Butao Zhang |
| Date | 11/17/2023 12:24 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Thanks for the info. I checked Hive3.1.3, and there will have performance 
issues when HS2 invoking method get_materialized_views_for_rewritin. You can 
refer to this ticket https://issues.apache.org/jira/browse/HIVE-21631 which was 
fixed in Hive4.


And if you do not need mv ability, here is a workaroud to fix this issue is 
that you can turn off mv rewriting by setting 
hive.materializedview.rewriting=false in hive-site.xml. Or you can port the 
change HIVE-21631 into your Hive3.1.3 source code.


BTW, you can also try Hive4 version to enjoy more interesting features, 
including Apache Iceberg integration, enhanced materialized view, etc.

Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/17/2023 09:06 |
| To |  |
| Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 



Hive version is 3.1.3


On Wed, Nov 15, 2023 at 9:23 PM Butao Zhang  wrote:

Hi, which version of hms are you using now? I have checked the master branch  
and beta-1 branch source code, but I can't find the place where this method 
get_materialized_views_for_rewriting  is called by mistake.



Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/16/2023 02:21 |
| To |  |
| Subject | Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 


We have a catalog with fairly a lot of databases and tables. 


Where we do a simple query (select * from table limit 5;) on an ideal cluster, 
it takes around 20seconds, sometimes longer (usually first run takes 40s+)


Looking at the hive-metastore logs during most of the query time the logs show 
"metastore.Hivemetastore: 13:  get_materialized_views_for_rewriting: db = 
" for each database. When these calls finish, the query executes 
pretty quickly. 


My interpretation of this is that most of the time is spent on analyzing the 
metastore and building a query plan, perhaps some sort of Metastore  in-memory 
cache is building built  (but it happens on every call). But I am not really 
sure how to debug it further, nor could find in the code where this is 
happening. 


Any advice on what's causing it or how to troubleshoot?


p.s
The metadata (i.e original tables and databases) is actually coming from a very 
old Hive version (1.1), and we migrated it to the newest version of the 
Metastore using the upgrade tool. In the original Hive version  materialized 
views were not even supported.  


Cheers,
Eugene





|
|
|
| |




--

|
| |

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread Butao Zhang
Thanks for the info. I checked Hive3.1.3, and there will have performance 
issues when HS2 invoking method get_materialized_views_for_rewritin. You can 
refer to this ticket https://issues.apache.org/jira/browse/HIVE-21631 which was 
fixed in Hive4.


And if you do not need mv ability, here is a workaroud to fix this issue is 
that you can turn off mv rewriting by setting 
hive.materializedview.rewriting=false in hive-site.xml. Or you can port the 
change HIVE-21631 into your Hive3.1.3 source code.


BTW, you can also try Hive4 version to enjoy more interesting features, 
including Apache Iceberg integration, enhanced materialized view, etc.

Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/17/2023 09:06 |
| To |  |
| Subject | Re: [EXTERNAL] Re: Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 



Hive version is 3.1.3


On Wed, Nov 15, 2023 at 9:23 PM Butao Zhang  wrote:

Hi, which version of hms are you using now? I have checked the master branch  
and beta-1 branch source code, but I can't find the place where this method 
get_materialized_views_for_rewriting  is called by mistake.



Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/16/2023 02:21 |
| To |  |
| Subject | Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 


We have a catalog with fairly a lot of databases and tables. 


Where we do a simple query (select * from table limit 5;) on an ideal cluster, 
it takes around 20seconds, sometimes longer (usually first run takes 40s+)


Looking at the hive-metastore logs during most of the query time the logs show 
"metastore.Hivemetastore: 13:  get_materialized_views_for_rewriting: db = 
" for each database. When these calls finish, the query executes 
pretty quickly. 


My interpretation of this is that most of the time is spent on analyzing the 
metastore and building a query plan, perhaps some sort of Metastore  in-memory 
cache is building built  (but it happens on every call). But I am not really 
sure how to debug it further, nor could find in the code where this is 
happening. 


Any advice on what's causing it or how to troubleshoot?


p.s
The metadata (i.e original tables and databases) is actually coming from a very 
old Hive version (1.1), and we migrated it to the newest version of the 
Metastore using the upgrade tool. In the original Hive version  materialized 
views were not even supported.  


Cheers,
Eugene





|
|
|
| |




--

|
| |

Re: [EXTERNAL] Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-16 Thread Eugene Miretsky
Hey!

Hive version is 3.1.3

On Wed, Nov 15, 2023 at 9:23 PM Butao Zhang  wrote:

> Hi, which version of hms are you using now? I have checked the master
> branch  and beta-1 branch source code, but I can't find the place where
> this method *get_materialized_views_for_rewriting*  is called by mistake.
>
> Thanks,
>
> Butao Zhang
>  Replied Message 
> From Eugene Miretsky 
> Date 11/16/2023 02:21
> To  
> Subject Slow Hive query with a lot of
> 'get_materialized_views_for_rewriting'
> Hey!
>
> We have a catalog with fairly a lot of databases and tables.
>
> Where we do a simple query (select * from table limit 5;) on an ideal
> cluster, it takes around 20seconds, sometimes longer (usually first run
> takes 40s+)
>
> Looking at the hive-metastore logs during most of the query time the logs
> show "metastore.Hivemetastore: 13:  get_materialized_views_for_rewriting:
> db = " for each database. When these calls finish, the query
> executes pretty quickly.
>
> My interpretation of this is that most of the time is spent on analyzing
> the metastore and building a query plan, perhaps some sort of Metastore
> in-memory cache is building built  (but it happens on every call). But I am
> not really sure how to debug it further, nor could find in the code where
> this is happening.
>
> Any advice on what's causing it or how to troubleshoot?
>
> p.s
> The metadata (i.e original tables and databases) is actually coming from a
> very old Hive version (1.1), and we migrated it to the newest version of
> the Metastore using the upgrade tool. In the original Hive version
> materialized views were not even supported.
>
> Cheers,
> Eugene
>
>
>

--


Re: Slow Hive query with a lot of 'get_materialized_views_for_rewriting'

2023-11-15 Thread Butao Zhang
Hi, which version of hms are you using now? I have checked the master branch  
and beta-1 branch source code, but I can't find the place where this method 
get_materialized_views_for_rewriting  is called by mistake.



Thanks,

Butao Zhang

 Replied Message 
| From | Eugene Miretsky |
| Date | 11/16/2023 02:21 |
| To |  |
| Subject | Slow Hive query with a lot of 
'get_materialized_views_for_rewriting' |
Hey! 


We have a catalog with fairly a lot of databases and tables. 


Where we do a simple query (select * from table limit 5;) on an ideal cluster, 
it takes around 20seconds, sometimes longer (usually first run takes 40s+)


Looking at the hive-metastore logs during most of the query time the logs show 
"metastore.Hivemetastore: 13:  get_materialized_views_for_rewriting: db = 
" for each database. When these calls finish, the query executes 
pretty quickly. 


My interpretation of this is that most of the time is spent on analyzing the 
metastore and building a query plan, perhaps some sort of Metastore  in-memory 
cache is building built  (but it happens on every call). But I am not really 
sure how to debug it further, nor could find in the code where this is 
happening. 


Any advice on what's causing it or how to troubleshoot?


p.s
The metadata (i.e original tables and databases) is actually coming from a very 
old Hive version (1.1), and we migrated it to the newest version of the 
Metastore using the upgrade tool. In the original Hive version  materialized 
views were not even supported.  


Cheers,
Eugene



--

|

|
| |
|
Eugene Miretsky
|
|
Managing Partner |  Badal.io | Book a meeting /w me! 
|
| mobile:  416-568-9245 |
| email: eug...@badal.io |
|
|
| |

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-10 Thread Simhadri G
Please ensure hive.stats.autogather  is enabled as well.

On Fri, Nov 10, 2023, 2:57 PM Denys Kuzmenko  wrote:

> `hive.iceberg.stats.source` controls where the stats should be sourced
> from. When it's set to iceberg (default), we should go directly to iceberg
> and bypass HMS.
>


Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-10 Thread Denys Kuzmenko
`hive.iceberg.stats.source` controls where the stats should be sourced from. 
When it's set to iceberg (default), we should go directly to iceberg and bypass 
HMS. 


Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-09 Thread Butao Zhang
Can you please check this property? We need ensure it is true.
set hive.compute.query.using.stats=true;


In addition, it looks like the table created by spark has lots of data. Can you 
create a new table and insert into several values by spark, and then create & 
count(*) this  location_based_table table in hive. Does it also launch the tez 
task to scan table?



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 15:50 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
STEP1:
CREATE TABLE USING SPARK:
CREATE TABLE IF NOT EXISTS test.dwd.test_trade_table(
  `uni_order_id` string,
  `data_from` bigint,
  `partner` string,
  `plat_code` string,
  `order_id` string,
  `uni_shop_id` string,
  `uni_id` string,
  `guide_id` string,
  `shop_id` string,
  `plat_account` string,
  `total_fee` double,
  `item_discount_fee` 
double,
  `trade_discount_fee` 
double,
  `adjust_fee` double,
  `post_fee` double,
  `discount_rate` 
double,
  `payment_no_postfee` 
double,
  `payment` double,
  `pay_time` string,
  `product_num` bigint,
  `order_status` string,
  `is_refund` string,
  `refund_fee` double,
  `insert_time` string,
  `created` string,
  `endtime` string,
  `modified` string,
  `trade_type` string,
  `receiver_name` 
string,
  `receiver_country` 
string,
  `receiver_state` 
string,
  `receiver_city` 
string,
  `receiver_district` 
string,
  `receiver_town` 
string,
  `receiver_address` 
string,
  `receiver_mobile` 
string,
  `trade_source` string,
  `delivery_type` 
string,
  `consign_time` string,
  `orders_num` bigint,
  `is_presale` bigint,
  `presale_status` 
string,
  `first_fee_paytime` 
string,
  `last_fee_paytime` 
string,
  `first_paid_fee` 
double,
  `tenant` string,
  `tidb_modified` 
string,
  `step_paid_fee` 
double,
  `seller_flag` string,
  `is_used_store_card` 
BIGINT,
  `store_card_used` 
DOUBLE,
  
`store_card_basic_used` DOUBLE,
  
`store_card_expand_used` DO

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
','write.orc.bloom.filter.columns'='order_id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;


STEP2:
HIVE CREATE EXTERNAL TABLE(location_based_table):
CREATE EXTERNAL TABLE hyt.test_trade
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xxx'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');



STEP3:
select count(*) => scan all table







在 2023-11-09 15:36:50,"Butao Zhang"  写道:

Could you please provide detailed steps to reproduce this issue?  e.g. how do 
you create the table?



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 14:25 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
Incidentally, I'm using a COW table, so there is no DELETE_FILE.











在 2023-11-09 10:57:35,"Butao Zhang"  写道:

Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
   

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread Butao Zhang
Could you please provide detailed steps to reproduce this issue?  e.g. how do 
you create the table?



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 14:25 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
Incidentally, I'm using a COW table, so there is no DELETE_FILE.











在 2023-11-09 10:57:35,"Butao Zhang"  写道:

Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
Incidentally, I'm using a COW table, so there is no DELETE_FILE.











在 2023-11-09 10:57:35,"Butao Zhang"  写道:

Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread Butao Zhang
Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Sungwoo Park
Celeborn and Uniffle can also be seen as a move to separate local storage
from compute nodes.

1. In the old days, Hadoop was based on the idea of collocating compute and
storage.
2. Later a new paradigm of separating compute and storage emerged and got
popularized.
3. Now people want to not just separate compute and storage, but also
separate local storage from compute nodes.

In the future, all of shuffle/spill files might be stored in a dedicated
system like Celeborn and Uniffle. In our case of developing Hive-MR3, we
completely removed spill files for unordered edges thanks to the efficient
buffering in Celeborn.

Thanks,

--- Sungwoo

On Thu, Nov 2, 2023 at 7:31 PM Keyong Zhou  wrote:

> I think both Celeborn and Uniffle are good alternatives as a general
> shuffle service.
> I recommend that you try them : ). For any question about Celeborn, we're
> very glad
> to discuss in Celeborn's mail lists[1][2] or slack[3].
>
> [1] u...@celeborn.apache.org
> [2] d...@celeborn.apache.org
> [3]
> https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q
>
> Thanks,
> Keyong Zhou
>
> On 2023/10/31 14:24:38 "Battula, Brahma Reddy" wrote:
> > Thanks for bringing up this. Good to see that it supports spark and
> flink.
> >
> > Have you done comparison between uniffle and celeborn..?
> >
> >
> > On 30/10/23, 8:01 AM, "Keyong Zhou"  zho...@apache.org>> wrote:
> >
> >
> > Great to hear this! It's encouraging that Celeborn helps MR3.
> >
> >
> > Celeborn is a general purpose remote shuffle service that stores and
> serves
> > shuffle data (and other intermediate data in the future) to help compute
> engines
> > better use disaggregated architecture, as well as become more efficient
> and
> > stable for huge shuffle sized jobs.
> >
> >
> > Currently Celeborn supports Hive on MR, and I think integrating with MR3
> > provides a good example to support Hive on Tez.
> >
> >
> > Thanks,
> > Keyong Zhou
> >
> >
> > On 2023/10/24 12:08:54 Sungwoo Park wrote:
> > > Hi Hive users,
> > >
> > > Before the impending release of MR3 1.8, we would like to announce the
> > > release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> > > 0.3.1).
> > >
> > > Apache Celeborn [1] is remote shuffle service, similar to Magnet [2]
> and
> > > Apache Uniffle [3] (which was discussed in this Hive mailing list a
> while
> > > ago). Celeborn officially supports Spark and Flink, and we have
> implemented
> > > an MR3-extension for Celeborn.
> > >
> > > In addition to all the benefits of using remote shuffle service,
> > > Hive-MR3-Celeborn supports direct processing of mapper output on the
> > > reducer side, which means that reducers do not store mapper output on
> local
> > > disks (for unordered edges). In this way, Hive-MR3-Celeborn can
> eliminate
> > > over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> > > This can be particularly useful when running Hive-MR3 on public clouds
> > > where fast local disk storage is expensive or not available.
> > >
> > > We have documented the usage of Hive-MR3-Celeborn in [4]. You can
> download
> > > Hive-MR3-Celeborn in [5].
> > >
> > > FYI, MR3 is an execution engine providing native support for Hadoop,
> > > Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> > > provides the performance of LLAP yet is very easy to install and
> operate.
> > > If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3
> will
> > > give you a much higher throughput thanks to its advanced resource
> sharing
> > > model.
> > >
> > > We have recently opened a Slack channel. If interested, please join the
> > > Slack channel and ask any question on MR3:
> > >
> > >
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> <
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> >
> > >
> > > Thank you,
> > >
> > > --- Sungwoo
> > >
> > > [1] https://celeborn.apache.org/ 
> > > [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf <
> https://www.vldb.org/pvldb/vol13/p3382-shen.pdf>
> > > [3] https://uniffle.apache.org/ 
> > > [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ <
> https://mr3docs.datamonad.com/docs/mr3/features/celeborn/>
> > > [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 <
> https://github.com/mr3project/mr3-release/releases/tag/v1.8>
> > > [6] https://mr3docs.datamonad.com/ 
> > >
> >
> >
> >
> >
>


Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Keyong Zhou
I think both Celeborn and Uniffle are good alternatives as a general shuffle 
service.
I recommend that you try them : ). For any question about Celeborn, we're very 
glad
to discuss in Celeborn's mail lists[1][2] or slack[3].

[1] u...@celeborn.apache.org
[2] d...@celeborn.apache.org
[3] 
https://join.slack.com/t/apachecelebor-kw08030/shared_invite/zt-1ju3hd5j8-4Z5keMdzpcVMspe4UJzF4Q

Thanks,
Keyong Zhou

On 2023/10/31 14:24:38 "Battula, Brahma Reddy" wrote:
> Thanks for bringing up this. Good to see that it supports spark and flink.
> 
> Have you done comparison between uniffle and celeborn..?
> 
> 
> On 30/10/23, 8:01 AM, "Keyong Zhou"  > wrote:
> 
> 
> Great to hear this! It's encouraging that Celeborn helps MR3.
> 
> 
> Celeborn is a general purpose remote shuffle service that stores and serves
> shuffle data (and other intermediate data in the future) to help compute 
> engines
> better use disaggregated architecture, as well as become more efficient and
> stable for huge shuffle sized jobs.
> 
> 
> Currently Celeborn supports Hive on MR, and I think integrating with MR3
> provides a good example to support Hive on Tez.
> 
> 
> Thanks,
> Keyong Zhou
> 
> 
> On 2023/10/24 12:08:54 Sungwoo Park wrote:
> > Hi Hive users,
> >
> > Before the impending release of MR3 1.8, we would like to announce the
> > release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> > 0.3.1).
> >
> > Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> > Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> > ago). Celeborn officially supports Spark and Flink, and we have implemented
> > an MR3-extension for Celeborn.
> >
> > In addition to all the benefits of using remote shuffle service,
> > Hive-MR3-Celeborn supports direct processing of mapper output on the
> > reducer side, which means that reducers do not store mapper output on local
> > disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> > over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> > This can be particularly useful when running Hive-MR3 on public clouds
> > where fast local disk storage is expensive or not available.
> >
> > We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> > Hive-MR3-Celeborn in [5].
> >
> > FYI, MR3 is an execution engine providing native support for Hadoop,
> > Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> > provides the performance of LLAP yet is very easy to install and operate.
> > If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> > give you a much higher throughput thanks to its advanced resource sharing
> > model.
> >
> > We have recently opened a Slack channel. If interested, please join the
> > Slack channel and ask any question on MR3:
> >
> > https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> >  
> > 
> >
> > Thank you,
> >
> > --- Sungwoo
> >
> > [1] https://celeborn.apache.org/ 
> > [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf 
> > 
> > [3] https://uniffle.apache.org/ 
> > [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ 
> > 
> > [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 
> > 
> > [6] https://mr3docs.datamonad.com/ 
> >
> 
> 
> 
> 


Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
On Thu, Nov 2, 2023 at 1:43 PM Sungwoo Park  wrote:

> Have you done comparison between uniffle and celeborn..?
>>
>
> We did not compare the performance of Uniffle and Celeborn (because
> Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete
> yet). Much of the code in Hive-MR3-Celeborn is currently reused in
> Hive-MR3-Uniffle, so we think there are many architectural similarities
> between the two systems.
>
> We implemented our Celeborn extension first because a user of Hive-MR3
> wanted to use Celeborn which was already running in production. If any
> industrial user of Hive-MR3 wants to use Uniffle in production, please let
> us know.
>
> BTW, if you are using Hive-on-MapReduce or Hive-on-Tez, consider switching
> to Hive-on-Tez. You will see a huge increase (x3 to x10) in throughput.
>

Ooops, I meant switching to Hive-on-MR3 :-)


> Regards,
>
> --- Sungwoo
>
>


Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
>
> Have you done comparison between uniffle and celeborn..?
>

We did not compare the performance of Uniffle and Celeborn (because
Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete
yet). Much of the code in Hive-MR3-Celeborn is currently reused in
Hive-MR3-Uniffle, so we think there are many architectural similarities
between the two systems.

We implemented our Celeborn extension first because a user of Hive-MR3
wanted to use Celeborn which was already running in production. If any
industrial user of Hive-MR3 wants to use Uniffle in production, please let
us know.

BTW, if you are using Hive-on-MapReduce or Hive-on-Tez, consider switching
to Hive-on-Tez. You will see a huge increase (x3 to x10) in throughput.

Regards,

--- Sungwoo


Re: Announce: Hive-MR3 with Celeborn,

2023-10-31 Thread Battula, Brahma Reddy
Thanks for bringing up this. Good to see that it supports spark and flink.

Have you done comparison between uniffle and celeborn..?


On 30/10/23, 8:01 AM, "Keyong Zhou" mailto:zho...@apache.org>> wrote:


Great to hear this! It's encouraging that Celeborn helps MR3.


Celeborn is a general purpose remote shuffle service that stores and serves
shuffle data (and other intermediate data in the future) to help compute engines
better use disaggregated architecture, as well as become more efficient and
stable for huge shuffle sized jobs.


Currently Celeborn supports Hive on MR, and I think integrating with MR3
provides a good example to support Hive on Tez.


Thanks,
Keyong Zhou


On 2023/10/24 12:08:54 Sungwoo Park wrote:
> Hi Hive users,
>
> Before the impending release of MR3 1.8, we would like to announce the
> release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> 0.3.1).
>
> Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> ago). Celeborn officially supports Spark and Flink, and we have implemented
> an MR3-extension for Celeborn.
>
> In addition to all the benefits of using remote shuffle service,
> Hive-MR3-Celeborn supports direct processing of mapper output on the
> reducer side, which means that reducers do not store mapper output on local
> disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> This can be particularly useful when running Hive-MR3 on public clouds
> where fast local disk storage is expensive or not available.
>
> We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> Hive-MR3-Celeborn in [5].
>
> FYI, MR3 is an execution engine providing native support for Hadoop,
> Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> provides the performance of LLAP yet is very easy to install and operate.
> If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> give you a much higher throughput thanks to its advanced resource sharing
> model.
>
> We have recently opened a Slack channel. If interested, please join the
> Slack channel and ask any question on MR3:
>
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
>  
> 
>
> Thank you,
>
> --- Sungwoo
>
> [1] https://celeborn.apache.org/ 
> [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf 
> 
> [3] https://uniffle.apache.org/ 
> [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/ 
> 
> [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8 
> 
> [6] https://mr3docs.datamonad.com/ 
>





Re: Announce: Hive-MR3 with Celeborn,

2023-10-29 Thread Keyong Zhou
Great to hear this! It's encouraging that Celeborn helps MR3.

Celeborn is a general purpose remote shuffle service that stores and serves 
shuffle data (and other intermediate data in the future) to help compute 
engines 
better use disaggregated architecture, as well as become more efficient and 
stable for huge shuffle sized jobs.

Currently Celeborn supports Hive on MR, and I think integrating with MR3 
provides a good example to support Hive on Tez.

Thanks,
Keyong Zhou

On 2023/10/24 12:08:54 Sungwoo Park wrote:
> Hi Hive users,
> 
> Before the impending release of MR3 1.8, we would like to announce the
> release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn
> 0.3.1).
> 
> Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and
> Apache Uniffle [3] (which was discussed in this Hive mailing list a while
> ago). Celeborn officially supports Spark and Flink, and we have implemented
> an MR3-extension for Celeborn.
> 
> In addition to all the benefits of using remote shuffle service,
> Hive-MR3-Celeborn supports direct processing of mapper output on the
> reducer side, which means that reducers do not store mapper output on local
> disks (for unordered edges). In this way, Hive-MR3-Celeborn can eliminate
> over 95% of local disk writes when tested on the 10TB TPC-DS benchmark.
> This can be particularly useful when running Hive-MR3 on public clouds
> where fast local disk storage is expensive or not available.
> 
> We have documented the usage of Hive-MR3-Celeborn in [4]. You can download
> Hive-MR3-Celeborn in [5].
> 
> FYI, MR3 is an execution engine providing native support for Hadoop,
> Kubernetes, and standalone mode [6]. Hive-MR3, its main application,
> provides the performance of LLAP yet is very easy to install and operate.
> If you are using Hive-Tez for running ETL jobs, switching to Hive-MR3 will
> give you a much higher throughput thanks to its advanced resource sharing
> model.
> 
> We have recently opened a Slack channel. If interested, please join the
> Slack channel and ask any question on MR3:
> 
> https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg
> 
> Thank you,
> 
> --- Sungwoo
> 
> [1] https://celeborn.apache.org/
> [2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf
> [3] https://uniffle.apache.org/
> [4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/
> [5] https://github.com/mr3project/mr3-release/releases/tag/v1.8
> [6] https://mr3docs.datamonad.com/
> 


Re: Metastore: How is the unique ID of new databases and tables determined?

2023-10-24 Thread Venu Reddy
Hi Eugene,

HMS depends on DataNucleus for the identity value generation for the HMS
tables. It is generated by DataNucleus when an object is made persistent.
DataNucleus value generator will generate values uniquely across different
JVMs. As Zoltan said, DataNucleus tracks with the SEQUENCE_TABLE for each
model class id allocation. We don't have id generation code directly in
metastore code. In recent times, to add the dynamic partitions using direct
sql to db, there is a method getDataStoreId(Class modelClass) in
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/DirectSqlInsertPart.java.
It fetches the next available id to use from DataNucleus directly. It is
used for classes using datastore identity type.
Not sure about how you are going to replicate into the target cluster. You
might have to explore DataNucleus value generation further in case this is
what you are looking for.

Regards,
Venu

On Tue, Oct 24, 2023 at 6:00 PM Zoltán Rátkai  wrote:

> Hi Eugene,
>
> the TBL_ID in TBLS table is handled by Datanucleus, so AUTO_INCREMENT
> won't help, since the TBL_ID is not defined as AUTO_INCREMENT.
>
> Datanucleus uses SEQUENCE_TABLE to store the actual value for primary
> keys. In this table this two rows is what you need to modify:
>
> org.apache.hadoop.hive.metastore.model.MDatabase
> org.apache.hadoop.hive.metastore.model.MTable
>
> e.g:
> update SEQUENCE_TABLE set NEXT_VAL = 1  where
> SEQUENCE_NAME='org.apache.hadoop.hive.metastore.model.MTable';
> and do it for org.apache.hadoop.hive.metastore.model.Database as well.
>
> After that if you create a table the TBL_ID will be used from this value.
> Datanucleus uses caching (default 10) so maybe the next tables will still
> use the old value. Try to create 10 simple table like this:
>
> create table test1 (i int);
> ...
> create table test10 (i int);
> and then drop them and check the TBL_ID.
>
> *Before doing this I recommend to create a backup from the Metastore DB!!*
>
> Also check this:
>
> https://community.cloudera.com/t5/Support-Questions/How-to-migrate-Hive-Table-From-one-cluster-to-another/m-p/235145
>
> Regards,
>
> Zoltan Ratkai
>
> On Sun, Oct 22, 2023 at 5:39 PM Eugene Miretsky  wrote:
>
>> Hey!
>>
>> Looking for a way to control the ids (DB_ID and TABLE_ID) of newly
>> created  databases and tables.
>>
>> We have a somewhat complicated use case where we replicate the metastore
>> (and data) from a source Hive cluster to a target cluster. However new
>> tables can be added on both source and target. We need a way to avoid
>> unique Id collision. One way would be to make sure all databases/tables
>> created in the target Hive start from a higher Id.
>>
>> We have tried to set AUTO_INCREAMENT='1' on a metastore MySQL db, but
>> it doesn't work. This makes us think the Id is generated by the Metastore
>> code itself, but we cannot find the right place in the code, or if it is
>> possible to control the logic.
>>
>> Any advice would be appreciated.
>>
>> Cheers,
>>
>


Re: Metastore: How is the unique ID of new databases and tables determined?

2023-10-24 Thread Zoltán Rátkai
Hi Eugene,

the TBL_ID in TBLS table is handled by Datanucleus, so AUTO_INCREMENT won't
help, since the TBL_ID is not defined as AUTO_INCREMENT.

Datanucleus uses SEQUENCE_TABLE to store the actual value for primary keys.
In this table this two rows is what you need to modify:

org.apache.hadoop.hive.metastore.model.MDatabase
org.apache.hadoop.hive.metastore.model.MTable

e.g:
update SEQUENCE_TABLE set NEXT_VAL = 1  where
SEQUENCE_NAME='org.apache.hadoop.hive.metastore.model.MTable';
and do it for org.apache.hadoop.hive.metastore.model.Database as well.

After that if you create a table the TBL_ID will be used from this value.
Datanucleus uses caching (default 10) so maybe the next tables will still
use the old value. Try to create 10 simple table like this:

create table test1 (i int);
...
create table test10 (i int);
and then drop them and check the TBL_ID.

*Before doing this I recommend to create a backup from the Metastore DB!!*

Also check this:
https://community.cloudera.com/t5/Support-Questions/How-to-migrate-Hive-Table-From-one-cluster-to-another/m-p/235145

Regards,

Zoltan Ratkai

On Sun, Oct 22, 2023 at 5:39 PM Eugene Miretsky  wrote:

> Hey!
>
> Looking for a way to control the ids (DB_ID and TABLE_ID) of newly
> created  databases and tables.
>
> We have a somewhat complicated use case where we replicate the metastore
> (and data) from a source Hive cluster to a target cluster. However new
> tables can be added on both source and target. We need a way to avoid
> unique Id collision. One way would be to make sure all databases/tables
> created in the target Hive start from a higher Id.
>
> We have tried to set AUTO_INCREAMENT='1' on a metastore MySQL db, but
> it doesn't work. This makes us think the Id is generated by the Metastore
> code itself, but we cannot find the right place in the code, or if it is
> possible to control the logic.
>
> Any advice would be appreciated.
>
> Cheers,
>


Re: Announce: Hive-MR3 with Celeborn,

2023-10-24 Thread lisoda
Thanks. I will try.



 Replied Message 
| From | Sungwoo Park |
| Date | 10/24/2023 20:08 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Announce: Hive-MR3 with Celeborn, |
Hi Hive users,


Before the impending release of MR3 1.8, we would like to announce the release 
of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn 0.3.1).

Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and Apache 
Uniffle [3] (which was discussed in this Hive mailing list a while ago). 
Celeborn officially supports Spark and Flink, and we have implemented an 
MR3-extension for Celeborn.

In addition to all the benefits of using remote shuffle service, 
Hive-MR3-Celeborn supports direct processing of mapper output on the reducer 
side, which means that reducers do not store mapper output on local disks (for 
unordered edges). In this way, Hive-MR3-Celeborn can eliminate over 95% of 
local disk writes when tested on the 10TB TPC-DS benchmark. This can be 
particularly useful when running Hive-MR3 on public clouds where fast local 
disk storage is expensive or not available.

We have documented the usage of Hive-MR3-Celeborn in [4]. You can download 
Hive-MR3-Celeborn in [5].

FYI, MR3 is an execution engine providing native support for Hadoop, 
Kubernetes, and standalone mode [6]. Hive-MR3, its main application, provides 
the performance of LLAP yet is very easy to install and operate. If you are 
using Hive-Tez for running ETL jobs, switching to Hive-MR3 will give you a much 
higher throughput thanks to its advanced resource sharing model.

We have recently opened a Slack channel. If interested, please join the Slack 
channel and ask any question on MR3:

https://join.slack.com/t/mr3-help/shared_invite/zt-1wpqztk35-AN8JRDznTkvxFIjtvhmiNg

Thank you,

--- Sungwoo

[1] https://celeborn.apache.org/
[2] https://www.vldb.org/pvldb/vol13/p3382-shen.pdf
[3] https://uniffle.apache.org/
[4] https://mr3docs.datamonad.com/docs/mr3/features/celeborn/
[5] https://github.com/mr3project/mr3-release/releases/tag/v1.8
[6] https://mr3docs.datamonad.com/


Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-24 Thread Ayush Saxena
HIVE-27734 is in progress, as I see we have a POC attached to the ticket,
we should have it in 2-3 week I believe.

> Also, after the release of 4.0.0, will we be able to do all TPCDS queries
on ICEBERG except for normal HIVE tables?

Yep, I believe most of the TPCDS queries would be supported even today on
Hive master, but 4.0.0 would have them running for sure.

-Ayush

On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

> Thanks.
> I would like to know if hive currently supports push to ICEBERG table
> partition under JOIN condition.
> Because I see HIVE-27734 is not yet complete, what is its progress so
> far?
> Also, after the release of 4.0.0, will we be able to do all TPCDS queries
> on ICEBERG except for normal HIVE tables?
>
>
>
>
>
> 在 2023-10-24 11:03:07,"Ayush Saxena"  写道:
>
> Hi Lisoda,
>
> The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a
> bunch of improvements on the 4.x line for Hive-Iceberg. You can give
> iceberg a try on the 4.0.0-beta-1 release mentioned here [1], we have a
> bunch of improvements like vecotrization and stuff like that. If you wanna
> give it a quick try on docker, we have docker image published for that here
> [2] & Iceberg works out of the box there.
>
> Rest feel free to create tickets, if you find some specific queries or
> scenarios which are problematic, we will be happy to chase them & get them
> sorted.
>
> PS. Not sure about StarRocks, FWIW. That is something we don't develop as
> part of Apache Hive nor as part of Apache Software Foundation to best of my
> knowledge, so would refrain from or commenting about that on "Apache Hive"
> ML
>
> -Ayush
>
>
> [1] https://hive.apache.org/general/downloads/
> [2] https://hub.docker.com/r/apache/hive/tags
>
> On Tue, 24 Oct 2023 at 05:28, Albert Wong 
> wrote:
>
>> Too bad.   Tencent Games used StarRocks with Apache Iceberg to power
>> their analytics.
>> https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
>>
>>
>> On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:
>>
>>> We are not going to use starrocks.
>>> mpp architecture databases have natural limitations, and starrocks does
>>> not necessarily perform better than hive llap.
>>>
>>>
>>>  Replied Message 
>>> From Albert Wong 
>>> Date 10/24/2023 01:39
>>> To user@hive.apache.org
>>> Cc
>>> Subject Re: Hive's performance for querying the Iceberg table is very
>>> poor.
>>> I would try http://starrocks.io.   StarRocks is an MPP OLAP database
>>> that can query Apache Iceberg and we can cache the data for faster
>>> performance.  We also have additional features like building materialized
>>> views that span across Apache Iceberg, Apache Hudi and Apache Hive.   Here
>>> is a video of connecting the 2 products through a webinar StarRocks did
>>> with Tabular (authors of Apache Iceberg).
>>> https://www.youtube.com/watch?v=bAmcTrX7hCI=10s
>>>
>>> On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:
>>>
>>>> Hi Team.
>>>>   I recently was testing Hive query Iceberg table , I found that
>>>> Hive query Iceberg table performance is very very poor . Almost impossible
>>>> to use in the production environment . And Join conditions can not be
>>>> pushed down to the Iceberg partition.
>>>>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>>>>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>>>>   Now I'm very frustrated because the performance is so bad that I
>>>> can't deliver to my customers. How can I solve this problem?
>>>>  Details:
>>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
>>>> I would be grateful if someone could guide me.
>>>>
>>>


Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Ayush Saxena
Hi Lisoda,

The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a
bunch of improvements on the 4.x line for Hive-Iceberg. You can give
iceberg a try on the 4.0.0-beta-1 release mentioned here [1], we have a
bunch of improvements like vecotrization and stuff like that. If you wanna
give it a quick try on docker, we have docker image published for that here
[2] & Iceberg works out of the box there.

Rest feel free to create tickets, if you find some specific queries or
scenarios which are problematic, we will be happy to chase them & get them
sorted.

PS. Not sure about StarRocks, FWIW. That is something we don't develop as
part of Apache Hive nor as part of Apache Software Foundation to best of my
knowledge, so would refrain from or commenting about that on "Apache Hive"
ML

-Ayush


[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags

On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

> Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their
> analytics.
> https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
>
>
> On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:
>
>> We are not going to use starrocks.
>> mpp architecture databases have natural limitations, and starrocks does
>> not necessarily perform better than hive llap.
>>
>>
>>  Replied Message 
>> From Albert Wong 
>> Date 10/24/2023 01:39
>> To user@hive.apache.org
>> Cc
>> Subject Re: Hive's performance for querying the Iceberg table is very
>> poor.
>> I would try http://starrocks.io.   StarRocks is an MPP OLAP database
>> that can query Apache Iceberg and we can cache the data for faster
>> performance.  We also have additional features like building materialized
>> views that span across Apache Iceberg, Apache Hudi and Apache Hive.   Here
>> is a video of connecting the 2 products through a webinar StarRocks did
>> with Tabular (authors of Apache Iceberg).
>> https://www.youtube.com/watch?v=bAmcTrX7hCI=10s
>>
>> On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:
>>
>>> Hi Team.
>>>   I recently was testing Hive query Iceberg table , I found that
>>> Hive query Iceberg table performance is very very poor . Almost impossible
>>> to use in the production environment . And Join conditions can not be
>>> pushed down to the Iceberg partition.
>>>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>>>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>>>   Now I'm very frustrated because the performance is so bad that I
>>> can't deliver to my customers. How can I solve this problem?
>>>  Details:
>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
>>> I would be grateful if someone could guide me.
>>>
>>


Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Albert Wong
Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their
analytics.
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

> We are not going to use starrocks.
> mpp architecture databases have natural limitations, and starrocks does
> not necessarily perform better than hive llap.
>
>
>  Replied Message 
> From Albert Wong 
> Date 10/24/2023 01:39
> To user@hive.apache.org
> Cc
> Subject Re: Hive's performance for querying the Iceberg table is very
> poor.
> I would try http://starrocks.io.   StarRocks is an MPP OLAP database that
> can query Apache Iceberg and we can cache the data for faster performance.
> We also have additional features like building materialized views that span
> across Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of
> connecting the 2 products through a webinar StarRocks did with Tabular
> (authors of Apache Iceberg).
> https://www.youtube.com/watch?v=bAmcTrX7hCI=10s
>
> On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:
>
>> Hi Team.
>>   I recently was testing Hive query Iceberg table , I found that Hive
>> query Iceberg table performance is very very poor . Almost impossible to
>> use in the production environment . And Join conditions can not be pushed
>> down to the Iceberg partition.
>>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>>   Now I'm very frustrated because the performance is so bad that I
>> can't deliver to my customers. How can I solve this problem?
>>  Details:
>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
>> I would be grateful if someone could guide me.
>>
>


Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread lisoda
We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Albert Wong
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that
can query Apache Iceberg and we can cache the data for faster performance.
We also have additional features like building materialized views that span
across Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of
connecting the 2 products through a webinar StarRocks did with Tabular
(authors of Apache Iceberg).
https://www.youtube.com/watch?v=bAmcTrX7hCI=10s

On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

> Hi Team.
>   I recently was testing Hive query Iceberg table , I found that Hive
> query Iceberg table performance is very very poor . Almost impossible to
> use in the production environment . And Join conditions can not be pushed
> down to the Iceberg partition.
>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>   Now I'm very frustrated because the performance is so bad that I
> can't deliver to my customers. How can I solve this problem?
>  Details:
> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
> I would be grateful if someone could guide me.
>


Re:

2023-10-12 Thread luckydog xf
Oh, I forget to add the email subject, apologize for that.

On Thu, Oct 12, 2023 at 5:19 PM luckydog xf  wrote:

> Hi, listAccording to this link
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+AdministrationJump
> to Section unning the Metastore Without Hive
> In order to run metastore without hive, set the following.
> metastore.task.threads.always
>
> org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask
>
> However, since hive-standalone-metastore 3.1.0, such setting has been
> replaced.
> I checked the v3.1.0, 3.1.2 and 3.1.3. The new configuration is
> ===
>  
> metastore.task.threads.always
>
> org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.RuntimeStatsCleanerTask,org.apache.hadoop.hive.metastore.repl.DumpDirCleanerTask
> Comma separated list of tasks that will be started in
> separate threads.  These will always be started, regardless of whether the
> metastore is running in embedded mode or in server mode.  They must
> implement org.apache.hadoop.hive.metastore.MetastoreTaskThread
>   
> ===
> I  googled release note and change log, turns out nothing was found.
>
> So I guess the documentation is out-of-date. What's the new setup if I
> use 3.1.x ?
> Thanks.
>
>
>
>


Re: hive running udf in metastore can't load configuration with xinclude

2023-10-05 Thread Okumin
Hi Wojtek,

I've not checked but I think your hive-site.xml has ``. Does it
still happen if you put all parameters directly in hive-site.xml? If that
resolves the issue, do you have a reason to use `include`?
https://github.com/apache/hadoop/blob/57100bba1bfd6963294181a2521396dc30c295f7/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L3288-L3291

Regards,
Okumin

On Fri, Sep 22, 2023 at 3:52 AM Wojtek Meler  wrote:

> I've noticed strange behaviour of hive. When you run query against
> partitioned table like this:
>
> select * from mytable
> where log_date = date_add('2023-09-10',1)
> limit 3
>
> (mytable is partitioned by log_date string column) hive is trying to
> evaluate date_add inside metastore and throws exception when sees xinclude
> in configuration file.
>
> java.lang.RuntimeException: Error parsing resource file:/etc/hive/
> conf.dist/hive-site.xml: XInclude is not supported for restricted
> resources
> at
> org.apache.hadoop.conf.Configuration$Parser.handleInclude(Configuration.java:3258)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration$Parser.handleStartElement(Configuration.java:3202)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3398)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3182)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3075)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:3041)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2914)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration.addResourceObject(Configuration.java:1034)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:939)
> ~[hadoop-common-3.3.4.jar:?]
> at
> org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:6353)
> ~[hive-common-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:6302)
> ~[hive-common-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.getBestAvailableConf(ExprNodeGenericFuncEvaluator.java:145)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:181)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartExprEvalUtils.prepareExpr(PartExprEvalUtils.java:118)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prunePartitionNames(PartitionPruner.java:556)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:96)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionNamesPrunedByExprNoTxn(ObjectStore.java:4105)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore.access$1700(ObjectStore.java:285)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore$11.getJdoResult(ObjectStore.java:4066)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore$11.getJdoResult(ObjectStore.java:4036)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:4362)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:4072)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExpr(ObjectStore.java:4016)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method) ~[?:?]
> at
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:?]
> at
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:?]
> at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
> at
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97)
> ~[hive-exec-4.0.0-alpha-2.jar:4.0.0-alpha-2]
> at com.sun.proxy.$Proxy27.getPartitionsByExpr(Unknown Source)
> ~[?:?]
> at
> org.apache.hadoop.hive.metastore.HMSHandler.get_partitions_spec_by_expr(HMSHandler.java:7366)
> 

Re: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread butaozha...@163.com

  
  
  

	
	Congratulations! Sourabh Badhya 

  

 发件人: user-return-27928-butaozhang1=163@hive.apache.org  代表 Ayush Saxena 发送时间: 星期三, 十月 4, 2023 12:10 下午收件人: d...@hive.apache.org 抄送: user@hive.apache.org 主题: Re: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya Congratulations Sourabh!!!

-Ayush

> On 04-Oct-2023, at 9:28 AM, Sankar Hariappan  wrote:
> 
> Congratulations Sourabh! Welcome to the Hive committers club! 
> 
> 
> 
> Thanks,
> 
> Sankar
> 
> 
> 
> -Original Message-
> From: Sourabh Badhya 
> Sent: Wednesday, October 4, 2023 9:19 AM
> To: d...@hive.apache.org; user@hive.apache.org
> Subject: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya
> 
> 
> 
> [You don't often get email from sbad...@cloudera.com.invalid. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> 
> 
> Thank you to the PMC members, committers and everyone who have helped me with their advice and reviews. It's been a pleasure working on Hive over the past couple of years. I hope to contribute and collaborate more for the project in the future.
> 
> 
> 
> About me: I am working at Cloudera for the past 2 years, mainly engaged in Apache Hive and related products. My current focus is on Iceberg support, however I have had the opportunity to work on other areas of Hive such as ACID compaction, optimising writes and related improvements.
> 
> 
> 
> Regards,
> 
> Sourabh Badhya
> 
> 
> 
>> On Tue, Oct 3, 2023 at 2:22 PM Stamatis Zampetakis >
>> 
>> wrote:
>> 
>> 
>> 
>> Apache Hive's Project Management Committee (PMC) has invited Sourabh
> 
>> Badhya to become a committer, and we are pleased to announce that he
> 
>> has accepted.
> 
>> 
> 
>> Sourabh has been doing some great work for the project. He has landed
> 
>> important fixes in critical parts of Hive and made significant
> 
>> contributions to the stabilization of ACID compactions, Direct Write
> 
>> functionality, and Iceberg support. Apart from code contributions,
> 
>> Sourabh has been regularly reviewing others' work and providing
> 
>> valuable feedback as well as testing and validating releases.
> 
>> 
> 
>> Sourabh, welcome, thank you for your contributions, and we look
> 
>> forward to your further interactions with the community! If you wish,
> 
>> please feel free to tell us more about yourself and what you are
> 
>> working on.
> 
>> 
> 
>> Stamatis (on behalf of the Apache Hive PMC)
> 
>> 



Re: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Ayush Saxena
Congratulations Sourabh!!!

-Ayush

> On 04-Oct-2023, at 9:28 AM, Sankar Hariappan 
>  wrote:
> 
> Congratulations Sourabh! Welcome to the Hive committers club! 
> 
> 
> 
> Thanks,
> 
> Sankar
> 
> 
> 
> -Original Message-
> From: Sourabh Badhya 
> Sent: Wednesday, October 4, 2023 9:19 AM
> To: d...@hive.apache.org; user@hive.apache.org
> Subject: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya
> 
> 
> 
> [You don't often get email from 
> sbad...@cloudera.com.invalid<mailto:sbad...@cloudera.com.invalid>. Learn why 
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> 
> 
> Thank you to the PMC members, committers and everyone who have helped me with 
> their advice and reviews. It's been a pleasure working on Hive over the past 
> couple of years. I hope to contribute and collaborate more for the project in 
> the future.
> 
> 
> 
> About me: I am working at Cloudera for the past 2 years, mainly engaged in 
> Apache Hive and related products. My current focus is on Iceberg support, 
> however I have had the opportunity to work on other areas of Hive such as 
> ACID compaction, optimising writes and related improvements.
> 
> 
> 
> Regards,
> 
> Sourabh Badhya
> 
> 
> 
>> On Tue, Oct 3, 2023 at 2:22 PM Stamatis Zampetakis 
>> mailto:zabe...@gmail.com>>
>> 
>> wrote:
>> 
>> 
>> 
>> Apache Hive's Project Management Committee (PMC) has invited Sourabh
> 
>> Badhya to become a committer, and we are pleased to announce that he
> 
>> has accepted.
> 
>> 
> 
>> Sourabh has been doing some great work for the project. He has landed
> 
>> important fixes in critical parts of Hive and made significant
> 
>> contributions to the stabilization of ACID compactions, Direct Write
> 
>> functionality, and Iceberg support. Apart from code contributions,
> 
>> Sourabh has been regularly reviewing others' work and providing
> 
>> valuable feedback as well as testing and validating releases.
> 
>> 
> 
>> Sourabh, welcome, thank you for your contributions, and we look
> 
>> forward to your further interactions with the community! If you wish,
> 
>> please feel free to tell us more about yourself and what you are
> 
>> working on.
> 
>> 
> 
>> Stamatis (on behalf of the Apache Hive PMC)
> 
>> 


RE: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Sankar Hariappan via user
Congratulations Sourabh! Welcome to the Hive committers club! 



Thanks,

Sankar



-Original Message-
From: Sourabh Badhya 
Sent: Wednesday, October 4, 2023 9:19 AM
To: d...@hive.apache.org; user@hive.apache.org
Subject: [EXTERNAL] Re: [ANNOUNCE] New committer: Sourabh Badhya



[You don't often get email from 
sbad...@cloudera.com.invalid<mailto:sbad...@cloudera.com.invalid>. Learn why 
this is important at https://aka.ms/LearnAboutSenderIdentification ]



Thank you to the PMC members, committers and everyone who have helped me with 
their advice and reviews. It's been a pleasure working on Hive over the past 
couple of years. I hope to contribute and collaborate more for the project in 
the future.



About me: I am working at Cloudera for the past 2 years, mainly engaged in 
Apache Hive and related products. My current focus is on Iceberg support, 
however I have had the opportunity to work on other areas of Hive such as ACID 
compaction, optimising writes and related improvements.



Regards,

Sourabh Badhya



On Tue, Oct 3, 2023 at 2:22 PM Stamatis Zampetakis 
mailto:zabe...@gmail.com>>

wrote:



> Apache Hive's Project Management Committee (PMC) has invited Sourabh

> Badhya to become a committer, and we are pleased to announce that he

> has accepted.

>

> Sourabh has been doing some great work for the project. He has landed

> important fixes in critical parts of Hive and made significant

> contributions to the stabilization of ACID compactions, Direct Write

> functionality, and Iceberg support. Apart from code contributions,

> Sourabh has been regularly reviewing others' work and providing

> valuable feedback as well as testing and validating releases.

>

> Sourabh, welcome, thank you for your contributions, and we look

> forward to your further interactions with the community! If you wish,

> please feel free to tell us more about yourself and what you are

> working on.

>

> Stamatis (on behalf of the Apache Hive PMC)

>


Re: Request write access to the Hive wiki.

2023-09-21 Thread Albert Wong
In https://cwiki.apache.org/confluence/display/Hive/ on "user
documentation", I'd like to add "StarRocks Integration".   StarRocks is an
OLAP database that can query data in Apache Hive (
https://docs.starrocks.io/en-us/latest/data_source/catalog/hive_catalog).

On Thu, Sep 21, 2023 at 12:23 PM Ayush Saxena  wrote:

> Hi Albert,
>
> Can you share some more details like which page you want to modify and
> details around the content
>
> -Ayush
>
> On 22-Sep-2023, at 12:43 AM, Albert Wong 
> wrote:
>
> 
> username is albertatcelerdata.com
>
> --
> [image: linkedin] 
> Albert Wong
>
> Community, Developer Relations, Technology Partnerships for
> StarRocks | CelerData
> [image: mobilePhone] 949 689 6412
> [image: emailAddress] albert.w...@celerdata.com
>
>


Re: Request write access to the Hive wiki.

2023-09-21 Thread Ayush Saxena
Hi Albert,Can you share some more details like which page you want to modify and details around the content -AyushOn 22-Sep-2023, at 12:43 AM, Albert Wong  wrote:username is albertatcelerdata.com-- Albert WongCommunity, Developer Relations, Technology Partnerships for StarRocks | CelerData949 689 6412albert.w...@celerdata.com


  1   2   3   4   5   6   7   8   9   10   >