from:"Stamatis Zampetakis"

CVE-2023-35701: Apache Hive: Arbitrary command execution via JDBC driver

2024-05-03 Thread Stamatis Zampetakis

Severity: moderate

Affected versions:

- Apache Hive 4.0.0-alpha-1 before 4.0.0

Description:

Improper Control of Generation of Code ('Code Injection') vulnerability in 
Apache Hive.

The vulnerability affects the Hive JDBC driver component and it can potentially 
lead to arbitrary code execution on the machine/endpoint that the JDBC driver 
(client) is running. The malicious user must have sufficient permissions to 
specify/edit JDBC URL(s) in an endpoint relying on the Hive JDBC driver and the 
JDBC client process must run under a privileged user to fully exploit the 
vulnerability. 

The attacker can setup a malicious HTTP server and specify a JDBC URL pointing 
towards this server. When a JDBC connection is attempted, the malicious HTTP 
server can provide a special response with customized payload that can trigger 
the execution of certain commands in the JDBC client.This issue affects Apache 
Hive: from 4.0.0-alpha-1 before 4.0.0.

Users are recommended to upgrade to version 4.0.0, which fixes the issue.

This issue is being tracked as HIVE-27554 

Credit:

Kostya Kortchinsky (reporter)

References:

https://hive.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-35701
https://issues.apache.org/jira/browse/HIVE-27554

Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Stamatis Zampetakis

The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
 wrote:
>
> Thank you for your hard work and dedication in releasing Apache Hive version 
> 4.0.0.
>
> Congratulations to the entire team on this achievement. Keep up the great 
> work!
>
> Does this consider as GA.?
>
> And Looks we need to update in the following location also.?
> https://hive.apache.org/general/downloads/
>
>
> From: Denys Kuzmenko 
> Date: Saturday, March 30, 2024 at 00:07
> To: user@hive.apache.org , d...@hive.apache.org 
> 
> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
> needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team

Re: [Hive Support] Query about StandardStructObjectInspector converting field names to lowercase

2024-02-01 Thread Stamatis Zampetakis

Hi Chang,

The hive-hcatalog-core-1.1.0-cdh5.13.1.jar jar file is not something
maintained by Apache. For vendor specific problems you should reach
out to the respective support team from where you obtained the
product.

Apart from that the version that you are using (5.13.1) is quite old.
Please re-try your use-case with the latest Apache Hive 4.0.0-beta-1
release [1] and report back if you still observe unexpected behavior.

Best,
Stamatis

[1] https://hive.apache.org/general/downloads/

On Mon, Jan 29, 2024 at 5:52 AM chang.wd  wrote:
>
> Dear Hive Support Team,
>
> I hope you are doing well. I am writing to inquire about a specific behavior 
> I encountered in Hive, related to the 
> org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector 
> class.
>
> Sql to reply this behavior:
> ```
> -- add JsonSerDe jar
> ADD JAR hive-hcatalog-core-1.1.0-cdh5.13.1.jar;
> -- create json table, the `struct` will become to lower case: 
> `struct`.
> CREATE TABLE `test.hive_json_struct_schema`(
>   `cond_keys` struct
> )
> ROW FORMAT SERDE
>   'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.mapred.TextInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
> ```
>
> When using the StandardStructObjectInspector class, it appears that field 
> names are being automatically converted to lowercase in the following code 
> snippet:
>
> ```
> this.fieldName = fieldName.toLowerCase();
> ```
>
> This behavior subsequently causes issues when querying JSON formatted tables, 
> particularly when nested Struct field names within the JSON data contain a 
> mix of uppercase and lowercase characters. Since field names are being 
> changed to lowercase by the StandardStructObjectInspector class, the actual 
> field names no longer match the expected field names, which leads to errors 
> when reading the data.(Not with SQL)
>
> I would appreciate if you could kindly provide an explanation for this design 
> choice and whether there are any available workarounds or alternative 
> solutions for this scenario. I understand that the class may have been 
> implemented to avoid case sensitivity issues, but in cases like mine where 
> field name case matters, it would be helpful to have a better understanding 
> of how to handle this situation.
>
> Thank you in advance for your assistance and guidance. I look forward to 
> hearing from you.
>
> Best regards,
>
> Chang

Re: Contributing doc

2024-01-14 Thread Stamatis Zampetakis

Hi Henri,

I gave you the necessary permissions to the wiki. Please check and if
you encounter any issues let us know.

Best,
Stamatus

On Fri, Jan 12, 2024 at 10:56 AM Henri Biestro  wrote:
>
>
> My Apache Id is hen...@apache.org.
> Cheers
>
> On 2024/01/12 09:52:20 Henri Biestro wrote:
> > Hello;
> > I'd like to contribute some documentation on Hive 4 - (
> > https://issues.apache.org/jira/browse/HIVE-27186  for instance)
> > May I get write access to the Wiki ( ie
> > https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0 ) ?
> > Thanks
> > Henri
> >

Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Stamatis Zampetakis

Congratulations Butao, well deserved! Very glad to see another Iceberg
expert joining the team.

Best,
Stamatis


On Tue, Nov 21, 2023, 4:47 PM Butao Zhang  wrote:

> Thank you to the Hive community for this honor. I will continue to
> contribute to the community with my efforts.
> Thanks all!
>
>
> Thanks,
> Butao Zhang
>  Replied Message 
> | From | Ayush Saxena |
> | Date | 11/21/2023 15:02 |
> | To | dev ,
>  ,
> Butao Zhang |
> | Subject | [ANNOUNCE] New committer: Butao Zhang (zhangbutao) |
> Hi All,
> Apache Hive's Project Management Committee (PMC) has invited Butao
> Zhang  to become a committer, and we are pleased to announce that he
> has accepted.
>
> Butao Zhang welcome, thank you for your contributions, and we look
> forward to your further interactions with the community!
>
> Ayush Saxena
> (On behalf of Apache Hive PMC)
>

[ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Stamatis Zampetakis

Apache Hive's Project Management Committee (PMC) has invited Sourabh
Badhya to become a committer, and we are pleased to announce that he
has accepted.

Sourabh has been doing some great work for the project. He has landed
important fixes in critical parts of Hive and made significant
contributions to the stabilization of ACID compactions, Direct Write
functionality, and Iceberg support. Apart from code contributions,
Sourabh has been regularly reviewing others' work and providing
valuable feedback as well as testing and validating releases.

Sourabh, welcome, thank you for your contributions, and we look
forward to your further interactions with the community! If you wish,
please feel free to tell us more about yourself and what you are
working on.

Stamatis (on behalf of the Apache Hive PMC)

[ANNOUNCE] Apache Hive 4.0.0-beta-1 Released

2023-08-15 Thread Stamatis Zampetakis

The Apache Hive team is proud to announce the release of Apache Hive
version 4.0.0-beta-1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Tez Frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 4.0.0-beta-1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12353351=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team

Re: Request write access to the Hive wiki

2023-05-12 Thread Stamatis Zampetakis

Hi Venu,

I just gave you permissions to view and add pages, attachments, and
comments. Please test and let me know if it works for you.
Please do not modify the wiki before the changes get committed to
master and coordinate with the reviewer to go over your modifications
to the wiki.

Thanks for contributing to Hive!
Stamatis

On Thu, May 11, 2023 at 2:59 PM Venu Reddy  wrote:
>
> Hi,
>
> I need to update hive wiki as a part of the issue fix - 
> https://issues.apache.org/jira/browse/HIVE-27308
> Request you to please provide write access to me(Confluence username: 
> venureddy).
>
> Regards,
> Venu

Re: Kill the Pig 

2023-04-28 Thread Stamatis Zampetakis

I checked the Pig repo and I see some recent activity. Rohini is actively
leading the effort towards a new Pig release. Given that there is proven
interest to maintain and contribute to this module I would prefer to keep
it for the time being unless there are major issues that I am not aware of.

Best,
Stamatis


On Thu, Apr 20, 2023, 9:09 PM Rohini Palaniswamy  wrote:

> Hi Attila,
>We still use HCatLoader and HCatStorer heavily and would like to retain
> the support. We are also fixing it to work with Iceberg tables and will be
> contributing patches for both Hive 3 and Hive 4. So would like the support
> to be continued with Hive 4.
>
> Regards,
> Rohini
>
> On Thu, Apr 20, 2023 at 1:59 AM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > +1 from me, let's just make sure we make a good salame out of it :)
> >
> > Best regards,
> > Alessandro
> >
> > On Thu, 20 Apr 2023 at 10:50, Attila Turoczy 
> > wrote:
> >
> >> Hi All,
> >>
> >> In Hive we have a pretty old component from 1972 and this is the Pig.
> Pig
> >> was cool somewhere in 2008, but nowadays it does not have any value in
> the
> >> big data world. Even the last small release of big was 6 years ago in
> 2017,
> >> also the pig community has pretty much died. Because this component is
> >> obsolete I would suggest removing it from Hive 4.0. The hive 3 will
> still
> >> contain it, but I think this is a right time to remove those components
> >> that are not valuable for the community.
> >>
> >> What do you think about it?
> >>
> >> Ps: If nobody wrote it back, It would mean I could kill the pig (rof
> rof)
> >> :)
> >>
> >> -Attila
> >>
> >
>

Re: [DISCUSS] Jira Public Signup Disabled

2023-03-06 Thread Stamatis Zampetakis

I just updated the wiki [1] pointing to the new account creation form:
https://selfserve.apache.org/jira-account.html

I also logged INFRA-24306 [2] for the deletion of
jira-reque...@hive.apache.org mailing list.

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
[2] https://issues.apache.org/jira/browse/INFRA-24306

On Fri, Mar 3, 2023 at 9:31 AM Stamatis Zampetakis 
wrote:

> Thanks for bringing this up Ayush.
>
> Yes we should update the wiki to reflect the new process; if nobody does
> it in the following days I will revise the respective page.
>
> We should also consider deleting jira-request mailing list if that's
> possible to avoid confusion.
>
> Best,
> Stamatis
>
>
> On Thu, Mar 2, 2023, 8:27 AM Ayush Saxena  wrote:
>
>> Folks,
>> New stuff now, INFRA has introduced a new Utility which can be used for
>> Jira id creation[1], It is mentioned over here as well in the
>> announcement[2] from Infra team.
>>
>> Guess we should update our contributor docs[3] to reflect that and ask
>> folks to route their request via this util.
>>
>> -Ayush
>>
>> [1] https://selfserve.apache.org/jira-account.html
>> [2] https://infra.apache.org/blog/brand-new-selfserve-page.html
>> [3]
>> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>>
>> On Thu, 17 Nov 2022 at 16:43, Stamatis Zampetakis 
>> wrote:
>>
>>> The jira-reque...@hive.apache.org has been created and I added relevant
>>> instructions on how to request a JIRA account in the wiki [1]. Feel free to
>>> improve as you see fit!
>>>
>>> Best,
>>> Stamatis
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>>>
>>> On Tue, Nov 15, 2022 at 9:59 PM Stamatis Zampetakis 
>>> wrote:
>>>
>>>> Logged https://issues.apache.org/jira/browse/INFRA-23905 for the
>>>> creation of the new mailing list.
>>>>
>>>> On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri <
>>>> achennag...@cloudera.com> wrote:
>>>>
>>>>> +1, Thank you, Stamatis.
>>>>>
>>>>> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
>>>>> wrote:
>>>>>
>>>>>> +1, Thanks, Stamatis.
>>>>>>
>>>>>> -Pravin
>>>>>>
>>>>>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis <
>>>>>> zabe...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Due to the large amount of spam account creation the ASF INFRA team
>>>>>>> has disabled the JIRA account creation [1].
>>>>>>>
>>>>>>> From the 11th of November, contributors who wish to have a JIRA
>>>>>>> account (to create, assign, watch, etc issues) will need to request an
>>>>>>> account through an ASF PMC.
>>>>>>>
>>>>>>> Other projects, such as Calcite, have already taken the necessary
>>>>>>> actions to streamline the process for new contributors [2].
>>>>>>>
>>>>>>> I would suggest drawing inspiration from Calcite and take similar
>>>>>>> actions in Hive.
>>>>>>>
>>>>>>> If you all agree we can start by creating a dedicated (private)
>>>>>>> mailing lists for such requests:
>>>>>>> jira-reque...@hive.apache.org
>>>>>>>
>>>>>>> and then proceed with a brief documentation of the process in the
>>>>>>> wiki or website.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Best,
>>>>>>> Stamatis
>>>>>>>
>>>>>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>>>>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>>>>>
>>>>>>

Re: [DISCUSS] Jira Public Signup Disabled

2023-03-03 Thread Stamatis Zampetakis

Thanks for bringing this up Ayush.

Yes we should update the wiki to reflect the new process; if nobody does it
in the following days I will revise the respective page.

We should also consider deleting jira-request mailing list if that's
possible to avoid confusion.

Best,
Stamatis


On Thu, Mar 2, 2023, 8:27 AM Ayush Saxena  wrote:

> Folks,
> New stuff now, INFRA has introduced a new Utility which can be used for
> Jira id creation[1], It is mentioned over here as well in the
> announcement[2] from Infra team.
>
> Guess we should update our contributor docs[3] to reflect that and ask
> folks to route their request via this util.
>
> -Ayush
>
> [1] https://selfserve.apache.org/jira-account.html
> [2] https://infra.apache.org/blog/brand-new-selfserve-page.html
> [3]
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>
> On Thu, 17 Nov 2022 at 16:43, Stamatis Zampetakis 
> wrote:
>
>> The jira-reque...@hive.apache.org has been created and I added relevant
>> instructions on how to request a JIRA account in the wiki [1]. Feel free to
>> improve as you see fit!
>>
>> Best,
>> Stamatis
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>>
>> On Tue, Nov 15, 2022 at 9:59 PM Stamatis Zampetakis 
>> wrote:
>>
>>> Logged https://issues.apache.org/jira/browse/INFRA-23905 for the
>>> creation of the new mailing list.
>>>
>>> On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri <
>>> achennag...@cloudera.com> wrote:
>>>
>>>> +1, Thank you, Stamatis.
>>>>
>>>> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
>>>> wrote:
>>>>
>>>>> +1, Thanks, Stamatis.
>>>>>
>>>>> -Pravin
>>>>>
>>>>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Due to the large amount of spam account creation the ASF INFRA team
>>>>>> has disabled the JIRA account creation [1].
>>>>>>
>>>>>> From the 11th of November, contributors who wish to have a JIRA
>>>>>> account (to create, assign, watch, etc issues) will need to request an
>>>>>> account through an ASF PMC.
>>>>>>
>>>>>> Other projects, such as Calcite, have already taken the necessary
>>>>>> actions to streamline the process for new contributors [2].
>>>>>>
>>>>>> I would suggest drawing inspiration from Calcite and take similar
>>>>>> actions in Hive.
>>>>>>
>>>>>> If you all agree we can start by creating a dedicated (private)
>>>>>> mailing lists for such requests:
>>>>>> jira-reque...@hive.apache.org
>>>>>>
>>>>>> and then proceed with a brief documentation of the process in the
>>>>>> wiki or website.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Best,
>>>>>> Stamatis
>>>>>>
>>>>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>>>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>>>>
>>>>>

Re: [ANNOUNCE] New committer for Apache Hive: Laszlo Vegh

2023-02-08 Thread Stamatis Zampetakis

Congratulations Laszlo!

ACID and compactions are a complex beast and the slightest problem there
can have a huge impact in the system.
Many thanks for all your work in this area that makes the life of the rest
of us much easier.

Best,
Stamatis

On Wed, Feb 8, 2023 at 9:46 AM Akshat m  wrote:

> Congratulations Laszlo, Very well deserved :)
>
> Regards,
> Akshat Mathur
>
> On Tue, Feb 7, 2023 at 9:08 PM Sai Hemanth Gantasala
>  wrote:
>
>> Congratulations Laszlo Vegh, Great work on the compaction stuff!!
>>
>> Thanks,
>> Sai.
>>
>> On Tue, Feb 7, 2023 at 4:24 AM Naveen Gangam 
>> wrote:
>>
>> > The Project Management Committee (PMC) for Apache Hive has invited
>> Laszlo
>> > Vegh (veghlaci05) to become a committer and we are pleased
>> > to announce that he has accepted.
>> >
>> > Contributions from Laszlo:
>> >
>> > He has authored 25 patches. Significant contributions to stabilization
>> of
>> > ACID compaction. Helped review other patches as well.
>> >
>> >
>> >
>> https://github.com/apache/hive/pulls?q=is%3Amerged+is%3Apr+author%3Aveghlaci05
>> >
>> > Being a committer enables easier contribution to the project since there
>> > is no need to go via the patch submission process. This should enable
>> > better productivity.A PMC member helps manage and guide the direction of
>> > the project.
>> >
>> > Congratulations
>> > Hive PMC
>> >
>>
>

Re: unsubscribe

2023-02-01 Thread Stamatis Zampetakis

Hi Darren,

In order to unsubscribe from the list you need to send an email here:
user-unsubscr...@hive.apache.org
The same holds to other mailing lists as well [1].

Best,
Stamatis

[1] https://hive.apache.org/community/mailinglists/

On Wed, Feb 1, 2023 at 11:34 AM Darren Beckstand 
wrote:

> unsubscribe
>
> On Tue, Jan 31, 2023, 10:59 PM  wrote:
>
>> unsubscribe
>>
>

Re: [ANNOUNCE] New PMC Member: Krisztian Kasa

2023-01-31 Thread Stamatis Zampetakis

Krisztian's impact on the project has been immense, particularly in areas
such as rewriting, view maintenance, iceberg integration, sub-query
processing, and top-k pushdown, to name a few.
His contributions cannot be fully captured by the numbers and go beyond
what they can indicate.

Keep up the amazing work, Krisztian!

Best,
Stamatis

On Tue, Jan 31, 2023 at 7:58 AM Akshat m  wrote:

> Congratulations Krisztian :)
>
> Regards,
> Akshat
>
> On Mon, Jan 30, 2023 at 10:23 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
>> Congratulations Krisztian, very well deserved! :)
>>
>> On Mon, 30 Jan 2023 at 17:34, László Bodor 
>> wrote:
>>
>>> Yay! Very well deserved. Krisztian has a broad knowledge of Hive and an
>>> extremely deep level of experience with the compiler itself (which is a
>>> huge beast we all know), looking forward to seeing further contributions!
>>>
>>> Naveen Gangam  ezt írta (időpont: 2023.
>>> jan. 30., H, 17:23):
>>>
 Hello Hive Community,
 Apache Hive PMC is pleased to announce that Krisztian Kasa (username:
 krisztiankasa) has accepted the Apache Hive PMC's invitation to become
 PMC
 Member, and is now our newest PMC member. Please join me in
 congratulating
 Krisztian !!!

 He has been an active member in the hive community across many aspects
 of
 the project. Many thanks to Krisztian for all the contributions he has
 made
 and looking forward to many more future contributions in the expanded
 role.

 https://github.com/apache/hive/commits?author=kasakrisz

 * 162 commits in master
 * 124 reviews in master
 * Reported 159 JIRAS

 Cheers,
 Naveen (on behalf of Hive PMC)

>>>

Re: [ANNOUNCE] New PMC Member: Laszlo Bodor

2023-01-30 Thread Stamatis Zampetakis

While the numbers give some insight, they do not tell the complete story of
how much Laszlo has helped drive the project forward.

Congrats Laszlo and thanks for everything that you have done for the
project.

Best,
Stamatis

On Sun, Jan 29, 2023 at 11:05 PM Sai Hemanth Gantasala <
saihema...@cloudera.com> wrote:

> Congratulations Laszlo!!
>
> On Sat, Jan 28, 2023 at 8:42 AM Simhadri G  wrote:
>
>> Congratulations Laszlo Bodor! :)
>>
>>
>>
>> On Sat, 28 Jan 2023, 20:26 Akshat m,  wrote:
>>
>>> Congratulations Laszlo
>>>
>>> Regards,
>>> Akshat
>>>
>>> On Sat, Jan 28, 2023 at 3:03 AM Naveen Gangam
>>> 
>>> wrote:
>>>
>>> > Hello Hive Community,
>>> > Apache Hive PMC is pleased to announce that Laszlo Bodor
>>> > (username:abstractdog) has accepted the Apache Hive PMC's invitation to
>>> > become PMC Member, and is now our newest PMC member. Please join me in
>>> > congratulating Laszlo !!!
>>> >
>>> > He has been an active member in the hive community across many aspects
>>> of
>>> > the project. Many thanks to Laszlo for all the contributions he has
>>> made
>>> > and looking forward to many more future contributions in the expanded
>>> role.
>>> >
>>> > https://github.com/apache/hive/commits?author=abstractdog
>>> >
>>> > * 96 commits in master [2]
>>> > * 66 reviews in master [3]
>>> > * Reported 163 JIRAS [6]
>>> >
>>> > Cheers,
>>> > Naveen (on behalf of Hive PMC)
>>> >
>>>
>>

Re: [ANNOUNCE] New PMC Member: Stamatis Zampetakis

2023-01-16 Thread Stamatis Zampetakis

Thanks everyone! I am very glad and honoured to join the PMC.

I really enjoy being part of this community and It is great interacting
with all of you on a daily basis; thank you for being part of this!

Best,
Stamatis

On Mon, Jan 16, 2023 at 2:12 PM Jiajun Xie 
wrote:

> Congratulations Stamatis :)
> Very well deserved!!!
>
> On Mon, 16 Jan 2023 at 13:51, Krisztian Kasa 
> wrote:
>
> > Congratulations Stamatis :)
> >
> > On Mon, Jan 16, 2023 at 6:27 AM S T  wrote:
> >
> > > Congrats Stamatis.
> > >
> > > Thanks
> > >
> > > On Sat, 14 Jan 2023 at 00:03, Naveen Gangam 
> > wrote:
> > >
> > >> Hello Hive Community,
> > >> Apache Hive PMC is pleased to announce that Stamatis Zampetakis has
> > >> accepted the Apache Hive PMC's invitation to become PMC Member, and is
> > now
> > >> our newest PMC member. Please join me in congratulating Stamatis !!!
> > >>
> > >> He has been an active member in the hive community across many aspects
> > of
> > >> the project. Many thanks to Stamatis for all the contributions he has
> > made
> > >> and looking forward to many more future contributions in the expanded
> > role.
> > >>
> > >> Cheers,
> > >> Naveen (on behalf of Hive PMC)
> > >>
> > >
> >
>

Re: Proposal: Revamp Apache Hive website.

2023-01-09 Thread Stamatis Zampetakis

Hi everyone,

Simhadri has been working hard to modernize the Hive website (HIVE-26565)
for the past few months and I am quite happy with the results.

I reviewed the respective PR [1] and will commit the changes in 24h unless
there are objections.

Best,
Stamatis

[1] https://github.com/apache/hive-site/pull/2

On Wed, Oct 5, 2022 at 8:46 PM Simhadri G  wrote:

> Thanks for the feedback Stamatis !
>
>- I have updated the PR to include a README.md file with instructions
>to build and view the site locally after making any new changes. This will
>help us preview the changes locally before pushing the commit. (Docker is
>not required here.)
>
>- Github pages was used to share the new website with the community
>and it will most likely not be necessary later on.
>
>- Regarding the role of Github Actions(gh-pages.yml):
>
>- Whenever a PR is merged to the main branch, a github action is
>   triggered .
>   - Github action will install a hugo and build the site with the new
>   changes.  Once the build is successful, HUGO then generates a set of 
> static
>   files and these files are automatically merged to the hive-site/asf-site
>   branch by github actions bot.
>   - From here, to publish  hive-site/asf-site to project web site
>   sub-domain (hive.apache.org),  we need to set up a configuration
>   block called publish in your .asf.yaml file. (
>   
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite).
>
>   - We will need help from apache infra - gmcdonald
>   <https://github.com/apache/hive-site/commits?author=gmcdonald> or
>   Humbedooh
>   <https://github.com/apache/hive-site/commits?author=Humbedooh> to
>   make sure that we have set this up correctly.
>
>   - I agree with your suggestion to keep the changes around the
>revamp as minimal as possible and not mix the content update with the
>framework change. In this case, we can make the other changes incrementally
>at a later stage.
>
>
> Thanks!
> Simhadri G
>
> On Wed, Oct 5, 2022 at 3:41 PM Stamatis Zampetakis 
> wrote:
>
>> Thanks for staying on top of this Simhadri.
>>
>> I will try to help reviewing the PR once I get some time.
>>
>> What is not yet clear to me from this discussion or by looking at the PR
>> is the workflow for making a change appear on the web (
>> https://hive.apache.org/). Having a README which clearly states what
>> needs to be done is a must.
>>
>> I also think it is quite important to have instructions and possibly
>> docker images for someone to be able to test how the changes look locally
>> before commiting a change to the repo.
>>
>> Another point that needs clarification is the role of github pages. I am
>> not sure why it is necessary at the moment and what exactly is the plan
>> going forward. If I understand well, currently it is used to preview the
>> changes but from my perspective we shouldn't need to commit something to
>> the repo to understand if something breaks or not; preview should happen
>> locally.
>>
>> I would suggest to keep the changes around the revamp as minimal as
>> possible and not mix the content update with the framework change. As
>> usual, smaller changes are easier to review and merge. It is definitely
>> worth updating and improving the content but let's do it incrementally so
>> that changes can get merged faster.
>>
>> The list of committers and PMC members for Hive can be found in the
>> apache phonebook [1]. The list can easily get outdated so maybe we can
>> consider adding links to [1] and/or github and other places instead of
>> duplicating the content. Anyways, let's first deal with the revamp and
>> discuss content changes later in separate JIRAs/PRs.
>>
>> Best,
>> Stamatis
>>
>> [1] https://home.apache.org/phonebook.html?project=hive
>>
>> On Sun, Oct 2, 2022 at 2:41 AM Simhadri G  wrote:
>>
>>> Hello Everyone,
>>>
>>> I have raised the PR for the revamped Hive Website here:
>>>  https://github.com/apache/hive-site/pull/2
>>>
>>> I kindly request if someone can help review this PR .
>>>
>>> Until the PR is merged, you can find the updated website here . Please
>>> have a look and any feedback is most welcome :)
>>> https://simhadri-g.github.io/hive-site/
>>>
>>> Few other things to note:
>>>
>>>- We will need help from someone who has write access to hive-site
>>>repo to update the gi

Re: [EXTERNAL] Re: [ANNOUNCE] New PMC Member: Ayush Saxena

2022-12-19 Thread Stamatis Zampetakis

Congrats Ayush! Very well deserved!

Thanks for all the hard work that you are putting for the project and
always being there when people ask for help.

Best,
Stamatis

On Tue, Dec 20, 2022 at 7:51 AM Sankar Hariappan via user <
user@hive.apache.org> wrote:

> Congrats Ayush!
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Simhadri G 
> *Sent:* Tuesday, December 20, 2022 12:16 PM
> *To:* user@hive.apache.org
> *Cc:* dev ; ayushsax...@apache.org
> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New PMC Member: Ayush Saxena
>
>
>
> Congratulations Ayush
>
>
>
> On Tue, 20 Dec 2022, 06:42 Naveen Gangam,  wrote:
>
> Hello Hive Community,
>
> Apache Hive PMC is pleased to announce that Ayush Saxena has accepted the
> Apache Hive PMC's invitation to become PMC Member, and is now our newest
> PMC member. Many thanks to Ayush for all the contributions he has made and
> looking forward to many more future contributions in the expanded role.
>
>
>
> Please join me in congratulating Ayush !!!
>
>
>
> Cheers,
>
> Naveen (on behalf of Hive PMC)
>
>
>
>

Re: Roadmap information

2022-11-29 Thread Stamatis Zampetakis

Hi Cristian,

The 4.0.0-alpha-2 was released on 16 November 2022. The next scheduled
release is most likely the stable 4.0.0 [1].
The usual release cadence is 4 to 6 months so the expected date should be
somewhere between March and June 2023.
A not so recent discussion about the content of 4.0.0 can be found here [2].

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26645
[2] https://lists.apache.org/thread/n245dd23kb2v3qrrfp280w3pto89khxj

On Mon, Nov 28, 2022 at 9:54 AM Cristian Astorino <
cristian.astor...@gmail.com> wrote:

> Hi,
>
> Is there a roadmap or a release date for Hive 4.0.0?
>
>
> Thank you,
>
> Cristian
>

Re: Issue for discussion (slight change in behavior): HIVE-26683

2022-11-22 Thread Stamatis Zampetakis

Hello,

Regarding the inconsistency you describe in the window function, indeed it
seems to be a bug. However, I would double-check with the SQL standard
to be sure there is no intentional deviation and/or test the query in
different DBMS.

As far as it concerns the behavior of the aggregate function SUM on
string/varchar types the SQL standard forbids this operation (small extract
below).

10.9 

Syntax Rules
5g) If SUM or AVG is specified, then:
i) DT shall be a numeric type or an interval type.

General Rules
6)d)v) If SUM is specified, then the result is the sum of the values in
TXA. If the sum is not within the
range of the declared type of the result, then an exception condition is
raised: data exception — numeric value out of range.

As you observed, Postgres is inline with the standard and forbids this
operation but this is not the case for every DBMS. Note that Hive is closer
to MySQL than it is to Postgres so in many cases it makes sense to use it
as a reference.
Below, I outline the results on 8.0.27 MySQL Community Server.

select sum('a') from tblstrcol;
+--+
| sum('a') |
+--+
|0 |
+--+

select sum('a') from tblstrcol where false;
+--+
| sum('a') |
+--+
| NULL |
+--+

When there are rows the result of SUM is zero, and NULL when the result set
is empty thus I am a bit skeptical about changing the existing behavior.

Best,
Stamatis

On Mon, Nov 21, 2022 at 3:53 PM Stephen Carlin  wrote:

> Wanted to throw this one out for discussion for a bug I found  and how to
> fix it...
>
> So we are inconsistent with how we handle sum() on windowing functions.
> If all the rows are null and the rows are all on "preceding" rows, we
> return NULL.  On "following" rows, however, if all the rows are null, we
> return 0.  This is inconsistent and I have a fix for that so that we always
> return null.  The fix I have is here (not yet reviewed):
> https://github.com/apache/hive/pull/3789
>
> My discussion though lies in a different problem which you can see in the
> patch I uploaded.  My current fix changes behavior of the following
> statement:  "select sum('a') from my_table".  If my_table has rows, right
> now we are return 0.0.
>
> I've looked on postgres and it doesn't even allow a sum on a string column
> so I can't really compare to that database.  My current fix doesn't disable
> this, but it does change the behavior to return NULL on this select.
>
> I kinda feel that returning NULL is more correct than return 0, but I
> wanted to throw this out there to see what y'all think.  This would be a
> change in behavior and that makes me nervous.
>
> Thanks!
>

Re: [DISCUSS] Jira Public Signup Disabled

2022-11-17 Thread Stamatis Zampetakis

The jira-reque...@hive.apache.org has been created and I added relevant
instructions on how to request a JIRA account in the wiki [1]. Feel free to
improve as you see fit!

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA

On Tue, Nov 15, 2022 at 9:59 PM Stamatis Zampetakis 
wrote:

> Logged https://issues.apache.org/jira/browse/INFRA-23905 for the creation
> of the new mailing list.
>
> On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri 
> wrote:
>
>> +1, Thank you, Stamatis.
>>
>> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
>> wrote:
>>
>>> +1, Thanks, Stamatis.
>>>
>>> -Pravin
>>>
>>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Due to the large amount of spam account creation the ASF INFRA team has
>>>> disabled the JIRA account creation [1].
>>>>
>>>> From the 11th of November, contributors who wish to have a JIRA account
>>>> (to create, assign, watch, etc issues) will need to request an account
>>>> through an ASF PMC.
>>>>
>>>> Other projects, such as Calcite, have already taken the necessary
>>>> actions to streamline the process for new contributors [2].
>>>>
>>>> I would suggest drawing inspiration from Calcite and take similar
>>>> actions in Hive.
>>>>
>>>> If you all agree we can start by creating a dedicated (private) mailing
>>>> lists for such requests:
>>>> jira-reque...@hive.apache.org
>>>>
>>>> and then proceed with a brief documentation of the process in the wiki
>>>> or website.
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Stamatis
>>>>
>>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>>
>>>

Re: [ANNOUNCE] Apache Hive 4.0.0-alpha-2 Released

2022-11-17 Thread Stamatis Zampetakis

Many thanks to everyone who made this release happen and especially Denys
for leading this effort!

Best,
Stamatis

On Wed, Nov 16, 2022 at 5:25 PM Denys Kuzmenko  wrote:

> The Apache Hive team is proud to announce the release of Apache Hive
> version 4.0.0-alpha-2
>
> The Apache Hive (TM) data warehouse software facilitates querying and
> managing large datasets residing in distributed storage. Built on top
> of Apache Hadoop (TM), it provides, among others:
>
> * Tools to enable easy data extract/transform/load (ETL)
>
> * A mechanism to impose structure on a variety of data formats
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>   data storage systems such as Apache HBase (TM)
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
> Spark frameworks.
>
> For Hive release details and downloads, please
> visit:https://hive.apache.org/downloads.html
>
> Hive 4.0.0-alpha-2 Release Notes are available
> here:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351489=Html=12310843
>
> We would like to thank the many contributors who made this release
> possible.
>
> Regards,
>
> The Apache Hive Team
>

Re: [DISCUSS] Jira Public Signup Disabled

2022-11-15 Thread Stamatis Zampetakis

Logged https://issues.apache.org/jira/browse/INFRA-23905 for the creation
of the new mailing list.

On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri 
wrote:

> +1, Thank you, Stamatis.
>
> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
> wrote:
>
>> +1, Thanks, Stamatis.
>>
>> -Pravin
>>
>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Due to the large amount of spam account creation the ASF INFRA team has
>>> disabled the JIRA account creation [1].
>>>
>>> From the 11th of November, contributors who wish to have a JIRA account
>>> (to create, assign, watch, etc issues) will need to request an account
>>> through an ASF PMC.
>>>
>>> Other projects, such as Calcite, have already taken the necessary
>>> actions to streamline the process for new contributors [2].
>>>
>>> I would suggest drawing inspiration from Calcite and take similar
>>> actions in Hive.
>>>
>>> If you all agree we can start by creating a dedicated (private) mailing
>>> lists for such requests:
>>> jira-reque...@hive.apache.org
>>>
>>> and then proceed with a brief documentation of the process in the wiki
>>> or website.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Stamatis
>>>
>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>
>>

[DISCUSS] Jira Public Signup Disabled

2022-11-15 Thread Stamatis Zampetakis

Hi everyone,

Due to the large amount of spam account creation the ASF INFRA team has
disabled the JIRA account creation [1].

>From the 11th of November, contributors who wish to have a JIRA account (to
create, assign, watch, etc issues) will need to request an account through
an ASF PMC.

Other projects, such as Calcite, have already taken the necessary actions
to streamline the process for new contributors [2].

I would suggest drawing inspiration from Calcite and take similar actions
in Hive.

If you all agree we can start by creating a dedicated (private) mailing
lists for such requests:
jira-reque...@hive.apache.org

and then proceed with a brief documentation of the process in the wiki or
website.

What do you think?

Best,
Stamatis

[1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
[2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s

Re: Consider using bi-directional links in JIRA

2022-10-21 Thread Stamatis Zampetakis

I added a few sentences about this in JIRA Guidelines [1].

Best,
Stamatis

[1] https://cwiki.apache.org/confluence/display/Hive/HowToContribute

On Thu, Oct 20, 2022 at 4:56 AM Naveen Gangam  wrote:

> +1. I find this very useful to know the dependencies/relationships. Thank
> you for bringing this up.
>
> On Fri, Oct 14, 2022 at 5:06 AM Stamatis Zampetakis 
> wrote:
>
>> Hi all,
>>
>> This is a small tip/reminder for everyone using JIRA.
>>
>> It is very common and convenient to refer to other tickets by adding the
>> HIVE-X pattern in summary, description, and comments.
>>
>> The pattern allows someone to navigate quickly to an older JIRA from the
>> current one but not the other way around.
>>
>> Ideally, along with the mention (HIVE-X) pattern, it helps to add an
>> explicit link (relates to, causes, depends upon, etc.) so that the
>> relationship between tickets is visible from both ends.
>>
>> This is extremely useful when we are reporting a regression/breaking
>> change from a past commit but in other cases as well.
>>
>> Best,
>> Stamatis
>>
>

Consider using bi-directional links in JIRA

2022-10-14 Thread Stamatis Zampetakis

Hi all,

This is a small tip/reminder for everyone using JIRA.

It is very common and convenient to refer to other tickets by adding the
HIVE-X pattern in summary, description, and comments.

The pattern allows someone to navigate quickly to an older JIRA from the
current one but not the other way around.

Ideally, along with the mention (HIVE-X) pattern, it helps to add an
explicit link (relates to, causes, depends upon, etc.) so that the
relationship between tickets is visible from both ends.

This is extremely useful when we are reporting a regression/breaking change
from a past commit but in other cases as well.

Best,
Stamatis

Re: Proposal: Revamp Apache Hive website.

2022-10-05 Thread Stamatis Zampetakis

Thanks for staying on top of this Simhadri.

I will try to help reviewing the PR once I get some time.

What is not yet clear to me from this discussion or by looking at the PR is
the workflow for making a change appear on the web (https://hive.apache.org/).
Having a README which clearly states what needs to be done is a must.

I also think it is quite important to have instructions and possibly docker
images for someone to be able to test how the changes look locally before
commiting a change to the repo.

Another point that needs clarification is the role of github pages. I am
not sure why it is necessary at the moment and what exactly is the plan
going forward. If I understand well, currently it is used to preview the
changes but from my perspective we shouldn't need to commit something to
the repo to understand if something breaks or not; preview should happen
locally.

I would suggest to keep the changes around the revamp as minimal as
possible and not mix the content update with the framework change. As
usual, smaller changes are easier to review and merge. It is definitely
worth updating and improving the content but let's do it incrementally so
that changes can get merged faster.

The list of committers and PMC members for Hive can be found in the apache
phonebook [1]. The list can easily get outdated so maybe we can consider
adding links to [1] and/or github and other places instead of duplicating
the content. Anyways, let's first deal with the revamp and discuss content
changes later in separate JIRAs/PRs.

Best,
Stamatis

[1] https://home.apache.org/phonebook.html?project=hive

On Sun, Oct 2, 2022 at 2:41 AM Simhadri G  wrote:

> Hello Everyone,
>
> I have raised the PR for the revamped Hive Website here:
>  https://github.com/apache/hive-site/pull/2
>
> I kindly request if someone can help review this PR .
>
> Until the PR is merged, you can find the updated website here . Please
> have a look and any feedback is most welcome :)
> https://simhadri-g.github.io/hive-site/
>
> Few other things to note:
>
>- We will need help from someone who has write access to hive-site
>repo to update the github workflow once PR is merged.
>- One more important question, I came across this (
>https://hive.apache.org/people.html ) page, while moving the .md file
>to the new website, which lists the current pmc and committers of hive. I
>noticed that this list is not upto date, a lot of people seem to be missing
>from this list. May I please know where I can find the updated date list of
>committers and PMCs which I can refer to and update the page.
>- Lastly, I plan to add a few more sections to the homepage soon, one
>of the sections I have in mind is to add an overview of all the apache
>projects that use or integrate with apache hive... If there are any other
>suggestions in addition to this please let me know.
>
>
> Thanks!
> Simhadri G
>
>
>
> On Sat, Sep 24, 2022 at 7:03 AM Simhadri G  wrote:
>
>> Thanks everyone,
>>
>>  I will begin with creating the PR and share the link in this thread soon.
>>
>> Thanks
>> Simhadri G
>>
>> On Sat, 24 Sep 2022, 04:52 Ayush Saxena,  wrote:
>>
>>> Thanx Everyone,
>>> Almost a week and we don’t seems to have any objections to start with up
>>> revamp task with hive-site repo for now.
>>>
>>> Other things as mentioned can be followed up and we can try to ask folks
>>> to establish a PMC consensus if the need be for the futher migration tasks.
>>>
>>> Simhadri, would be good to create a Jira and link the PR and drop the
>>> link here in the thread as well, so as people interested can drop
>>> suggestions regarding the design and content of the website over there, for
>>> anything else we can always come back here if we are blocked on something,
>>> or if something more needs to be done in this context.
>>>
>>> -Ayush
>>>
>>> On 21-Sep-2022, at 6:35 PM, Stamatis Zampetakis 
>>> wrote:
>>>
>>>
>>> 
>>> The javadocs are currently in svn and they can remain there for the
>>> moment. Eventually, they could be moved to a hive-site repository and for
>>> sure we don't want them in the main hive repo. I don't see an immediate
>>> need to change the place where javadocs are stored but if needed we can
>>> raise a JIRA ticket and continue the discussion there. It's not a good idea
>>> to discuss under a closed issue/PR.
>>>
>>> The hive-site repo is always gonna be the place for storing the
>>> generated website (html files etc). When you talk about moving back to the
>>> hive repo I guess you refer to the sour

Re: Proposal: Revamp Apache Hive website.

2022-09-21 Thread Stamatis Zampetakis

The javadocs are currently in svn and they can remain there for the moment.
Eventually, they could be moved to a hive-site repository and for sure we
don't want them in the main hive repo. I don't see an immediate need to
change the place where javadocs are stored but if needed we can raise a
JIRA ticket and continue the discussion there. It's not a good idea to
discuss under a closed issue/PR.

The hive-site repo is always gonna be the place for storing the generated
website (html files etc). When you talk about moving back to the hive repo
I guess you refer to the source/markdown files. The decision to change the
process of publishing the website will probably require a PMC vote with
lazy consensus.

I agree that we can start by updating the current setup. Then we can kick
off the discussion about moving the website sources to hive repo and start
publishing from there. I don't know if we need to move the javadocs, so we
can postpone this discussion till we hit an obstacle.

Best,
Stamatis

On Mon, Sep 19, 2022 at 12:01 PM Simhadri G  wrote:

> Thanks Owen, Stamatis, Ayush and Alessandro for the feedback.
>
>- Regarding the javadocs and the automatically build and to deploy
>github-pages discussion in the previous PR thread [1]
><https://github.com/apache/hive/pull/1410>,
>
>
>- Apache Iceberg-docs ([2] <https://iceberg.apache.org/javadoc/latest/>)
>   has recently set up a github workflow ([3])
>   
> <https://github.com/apache/iceberg-docs/actions/runs/3062679467/jobs/4943928455>
>   to publish the javadocs from a given javadocs dir [4]
>   <https://github.com/apache/iceberg-docs/tree/main/javadoc> , I
>   think we can setup the same workflow for Hive javadocs.
>   - As Ayush and Stamatis have mentioned, I think over the past 2
>   years, apache infra has added support for github actions and we can 
> confirm
>   that from Apache Iceberg/calcite docs that are currently using it.
>   - But I am not sure regarding which branch or directory we will
>   need to put the hive javadoc files . This needs more discussion and we 
> can
>   follow up on this([5]
>   <https://github.com/apache/hive/pull/1410#issuecomment-680111530>)
>   .
>
>
>-  I am not aware about the procedure or the approvals we need to move
>from hive-site repo back to the main repository. We will need help with
>this.
>
>- I was able to setup the github action on the POC repo:
>https://github.com/simhadri-g/hive-site/tree/new-site  .
>- Any changes to this repo/new-site will automatically reflect here
>   once the github workflow completes:
>   https://simhadri-g.github.io/hive-site/  .
>
>   - Considering the feedback, I think we can plan to do in 3 phases,
>for the first cut I would like to update the website in the present setup,
>followed by moving the javadocs to the hive-site repo  and as for the third
>phase , we can work on migrating from hive-site to hive repo.
>
>- If everyone agrees, can we please go ahead with the first phase?
>
>
> [1]https://github.com/apache/hive/pull/1410,
> [2]https://iceberg.apache.org/javadoc/latest/
> [3]
> https://github.com/apache/iceberg-docs/actions/runs/3062679467/jobs/4943928455
> [4]https://github.com/apache/iceberg-docs/tree/main/javadoc
> [5]https://github.com/apache/hive/pull/1410#issuecomment-680111530
> [6] https://github.com/apache/hive/pull/1410#issuecomment-680102815
>
>
> Thanks!
> Simhadri G
>
> On Mon, Sep 19, 2022 at 1:50 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
>> Hi everyone,
>> thanks Simhadri for pushing this forward.
>>
>> I like the look and feel of the new website, and I agree with Stamatis
>> that having the website sources in the Hive repo, and automatically
>> publishing the site upon commits would be very beneficial.
>>
>> Best regards,
>> Alessandro
>>
>> On Thu, 15 Sept 2022 at 23:11, Stamatis Zampetakis 
>> wrote:
>>
>>> Hi all,
>>>
>>> It's great to see some effort in improving the website. The POC from
>>> Simhadri looks really cool; I didn't check the content but I love the look
>>> and feel.
>>>
>>> Now regarding the current process for modifying and updating the website
>>> there is some info in this relatively recent thread [1].
>>>
>>> Moving forward, I would really like to have the source code of the
>>> website (markdown etc) in the main repo of the project [2], and use GitHub
>>> actions to automatically build and push the content to the site repo [3]
>>> per commit basis.
>>> This workflow is used in

Re: Proposal: Revamp Apache Hive website.

2022-09-15 Thread Stamatis Zampetakis

Hi all,

It's great to see some effort in improving the website. The POC from
Simhadri looks really cool; I didn't check the content but I love the look
and feel.

Now regarding the current process for modifying and updating the website
there is some info in this relatively recent thread [1].

Moving forward, I would really like to have the source code of the website
(markdown etc) in the main repo of the project [2], and use GitHub actions
to automatically build and push the content to the site repo [3] per commit
basis.
This workflow is used in Apache Calcite and I find it extremely convenient.

Best,
Stamatis

[1] https://lists.apache.org/thread/4b6x4d6z4tgnv4mo0ycg30y4dlt0msbd
[2] https://github.com/apache/hive
[3] https://github.com/apache/hive-site

On Thu, Sep 15, 2022 at 10:50 PM Ayush Saxena  wrote:

> Owen,
> I am not sure if I am catching you right, But now the repository for the
> website has changed, we no longer use our main *hive.git* repository for
> the website, We are using the* hive-site *repository for the website, The
> migration happened this year January I suppose.
>
> Can give a check to the set of commit here from: gmcdonald
>  and
> Humbedooh 
> https://github.com/apache/hive-site/commits/main
>
> Now whatever you push to main branch of hive-site(
> https://github.com/apache/hive-site) it gets published on the *asf-site*
> branch by the buildbot(
> https://github.com/apache/hive-site/commits/asf-site)
>
> Simhadri's changes will be directed to the main branch of the hive-site
> repo and they will get auto published on the asf-site branch, I tried this
> a couple of months back and it indeed worked that way. Let me know if we
> are missing anything on this, I tried to find threads around this but not
> sure if it is in private@ or so, couldn't find, I will try again and if
> there is something around that what needs to be done, I will have a word
> with the Infra folks and get that sorted, if it isn't already.
>
> -Ayush
>
> On Fri, 16 Sept 2022 at 01:49, Owen O'Malley 
> wrote:
>
>> Look at the threads and talk to Apache Infra. They couldn't make it work
>> before. We would have needed to manually publish to the asf-site branch.
>>
>> On Thu, Sep 15, 2022 at 7:54 PM Simhadri G  wrote:
>>
>>> Thanks Ayush, Pau Tallada and Owen O'Malley for the feedback!
>>>
>>> @Owen , This website revamp indeed replaces the website with markdown as
>>> you have mentioned. I have referred to your PR for some of the content for
>>> the site.
>>> The actual code for the website is here:
>>> https://github.com/simhadri-g/hive-site/tree/new-site
>>>
>>> Once we add markdown files to the source code under /content/ , hugo
>>> will rebuild the files and generate the static html files in ./public/
>>> directory.
>>> I have copied over these static files to a separate repo and temporarily
>>> hosted it with gh-pages to start the mail chain.
>>>
>>>  For the final site, I am already trying to automate this with github
>>> actions. So, as soon as any new changes are made to the site branch, the
>>> github actions will automatically tigger and update the site.
>>>
>>> Thanks!
>>>
>>> On Fri, Sep 16, 2022 at 12:17 AM Owen O'Malley 
>>> wrote:
>>>
 I found it - https://github.com/apache/hive/pull/1410

 On Thu, Sep 15, 2022 at 6:42 PM Owen O'Malley 
 wrote:

> I had a PR to replace the website with markdown. Apache Infra was
> supposed to make it autopublish. *sigh*
>
> .. Owen
>
> On Thu, Sep 15, 2022 at 4:23 PM Pau Tallada  wrote:
>
>> Hi,
>>
>> Great work!
>> +1 on updating it as well
>>
>> Missatge de Ayush Saxena  del dia dj., 15 de
>> set. 2022 a les 17:40:
>>
>>> Hi Simhadri,
>>> Thanx for the initiative, +1 on updating our current website.
>>> The new website looks way better than the existing one.
>>> Can create a Jira and link this to that after a couple of days if
>>> there aren’t any objections to the move, so as people can drop further
>>> suggestions over there.
>>>
>>> -Ayush
>>>
>>> > On 15-Sep-2022, at 8:33 PM, SG  wrote:
>>> >
>>> > Hi Everyone,
>>> >
>>> > The existing apache hive website https://hive.apache.org/ hasn't
>>> been
>>> > updated for a very long time. Additionally, I was not able to
>>> build the
>>> > docker image associated with the site to test out new changes as
>>> well.
>>> > https://github.com/apache/hive-site
>>> >
>>> > Since the website is the front page of the project, I believe it
>>> would be
>>> > good to revamp the apache hive website with the latest features and
>>> > releases.
>>> >
>>> > As a result, I have spent some time setting up an initial draft of
>>> the
>>> > website. There are still quite a few things that still need to be
>>> >

Re: hive standalone can't find MaterializationsCacheCleanerTask

2022-09-07 Thread Stamatis Zampetakis

The task was removed by the following commit. Check HIVE-20006 and related
jiras for more details.

commit 1b5903b035c3b3ac02efbddf36d5438cda97cc91
Author: Jesus Camacho Rodriguez 
Date:   Tue Jun 26 11:37:27 2018 -0700

HIVE-20006: Make materializations invalidation cache work with multiple
active remote metastores (Jesus Camacho Rodriguez, reviewed by Ashutosh
Chauhan)

Best,
Stamatis

On Tue, Aug 30, 2022 at 4:59 AM second_comet.yahoo.com via user <
user@hive.apache.org> wrote:

> May i know is MaterializationsCacheCleanerTask removed since 3.1 ? I can
> find it iin 3.0 but for latest version of hive, i cant find it. Can advice?
>  
>   metastore.task.threads.always
>
> org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask
> 
>
> Thank you
>
>
>

Re: HiveServer2 slowly increaseing background CPU usage until restated

2022-09-07 Thread Stamatis Zampetakis

Hi Laurence,

It's hard to say just by seeing the graphs. Moreover Hive 2.3.1 is a quite
old version so there are many things that may go wrong.

I would suggest checking the logs and taking jstacks overtime and/or use a
profiler (such as async-profiler[1]) to see what HS2 is actually doing
while CPU usage grows.

Best,
Stamatis

[1] https://github.com/jvm-profiling-tools/async-profiler

On Mon, Sep 5, 2022 at 2:17 PM Laurence Brown via user 
wrote:

>
>
> Hi
>
> We’re using Hive 2.3.1, we recently migrated our production amazon EC2
> instance types from r5.24xlarge to r6i.32xlarge
>
> on the r6 instance we have seen steady cpu usage growth that can all be
> attributed to our org.apache.hive.service.server.HiveServer2
>
> Even when this change is unreleased and this process is going effectively
> unused the CPU usage grows slowly until we restart that process
>
>
>
> In the attached graph you can see that CPU usage grows until we restart
> HiveServer2 after that it remains stable for a while and then usage starts
> growing on HiveServer2 .
> After we restarted that process we failed back to our previous server
> (leaving this server unused) but the CPU usage on HiveServer2 on this
> server continue to grow
>
>
>
> We’ve since built instances in dev with both r5 and r6i  and all the r6i
> instances have the above problem and all the r5 do not…..
>
> Does anyone have any idea why this might be?
>
>
>
>
>
>
>
>
> --
> This email and any attachment is confidential. If you are not the intended
> recipient, please delete this message. Macquarie does not guarantee the
> integrity of any emails or attachments. For important disclosures and
> information about the incorporation and regulated status of Macquarie Group
> entities please see: www.macquarie.com/disclosures
>

Re: [DISCUSS] Hive EOL question

2022-06-20 Thread Stamatis Zampetakis

Hi Guangming,

There was a recent discussion about EOL Hive releases [1] but it was not
conclusive.

Feel free to reopen that thread if you have some thoughts on the subject.

Best,
Stamatis

[1] https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s

On Sun, Jun 19, 2022 at 11:20 AM Guangming Lu  wrote:

> Hi, who knows the EOL schedule for each  Hive release? For example, when
> will 3.1.0 EOL be implemented.
>
> Best,
> Guangming
>
>
>
>
>
>

Re: [DISCUSS] Remove Druid dependency from Hive

2022-06-17 Thread Stamatis Zampetakis

Hi Simhadri,

Thanks for starting this discussion Simhadri.

I am cc'ing the user list as well so that we have a better idea if there
are any active users.

Personally I am not that familiar with the Druid module.
* Is it currently broken?
* Do we have active tests?
* Does it need significant effort to update the Druid version?

Best,
Stamatis

On Thu, Jun 16, 2022 at 10:33 AM SG  wrote:

> Hello Everyone,The last commits related to druid were around early
> 2020[1]Since
> then the version of Druid used by hive has remained the same 0.17.1[2]Druid
> version 0.17.1 has a significant number of CVEs
> 
> associated
> with it and some of which allow remote code execution.If no one is
> maintaining it or plan to do so in near future, Can we remove it from our
> code?Thoughts?-Simhadri[1]
>
> https://github.com/apache/hive/search?o=desc=druid=committer-date=commits
> [2]
>
> https://github.com/apache/hive/blob/0033675057a60d0a05a252854455e2b8835e89cc/pom.xml#L127
>

Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-10 Thread Stamatis Zampetakis

;>
>> Hi Team,
>>
>> My experience with the Iceberg community shows that there are some
>> sizeable userbase around Hive 2.x. I have seen patches, contributions to
>> Hive 2.3.x branches, and the tests are in much better shape there.
>>
>> I would definitely vote for EOL Hive 1.x, but until we have a stable 4.x,
>> I would be cautious about slashing 2.x, 3.x branches.
>>
>> Just my 2 cents.
>>
>> Peter
>>
>> On 2022. May 9., at 10:51, Alessandro Solimando <
>> alessandro.solima...@gmail.com> wrote:
>>
>> Hi Stamatis,
>> thanks for bringing up this topic, I basically agree on everything you
>> wrote.
>>
>> I just wanted to add that this kind of proposal might sound harsh,
>> because in many contexts upgrading is a complex process, but it's in
>> nobody's interest to keep release branches that are missing important
>> fixes/improvements and that might not meet the quality standards that
>> people expect, as mentioned.
>>
>> Since we don't have yet a stable 4.x release (only alpha for now) we
>> might want to keep supporting the 3.x branch until the first 4.x stable
>> release and EOL < 3.x branches, WDYT?
>>
>> Best regards,
>> Alessandro
>>
>> On Fri, 6 May 2022 at 23:14, Stamatis Zampetakis 
>> wrote:
>>
>>
>> Hi all,
>>
>> The current master has many critical bug fixes as well as important
>> performance improvements that are not backported (and most likely never
>> will) to the maintenance branches.
>>
>> Backporting changes from master usually requires adapting the code and
>> tests in questions making it a non-trivial and time consuming task.
>>
>> The ASF bylaws require PMCs to deliver high quality software which
>> satisfy certain criteria. Cutting new releases from maintenance branches
>> with known critical bugs is not compliant with the ASF.
>>
>> CI is unstable in all maintenance branches making the quality of a
>> release questionable and merging new PRs rather difficult. Enabling and
>> running it frequently in all maintenance branches would require a big
>> amount of resources on top of what we already need for master.
>>
>> History has shown that it is very difficult or impossible to properly
>> maintain multiple release branches for Hive.
>>
>> I think it would be to the best interest of the project if the PMC
>> decided to drop support for maintenance branches and focused on releasing
>> exclusively from master.
>>
>> This mail is related to the discussion about the release cadence [1]
>> since it would certainly help making Hive releases more regular. I decided
>> to start a separate thread to avoid mixing multiple topics together.
>>
>> Looking forward to your thoughts.
>>
>> Best,
>> Stamatis
>>
>> [1]
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fn245dd23kb2v3qrrfp280w3pto89khxjdata=05%7C01%7Cbbattula%40visa.com%7Ccba1383657724a00f0bb08da31e069bc%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637877137169408371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=X3BJyzgALXZVnjmd2PzbLrOi4lXMHxEQa8KwA1Pz7BQ%3Dreserved=0
>>
>>
>>

[DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-06 Thread Stamatis Zampetakis

Hi all,

The current master has many critical bug fixes as well as important
performance improvements that are not backported (and most likely never
will) to the maintenance branches.

Backporting changes from master usually requires adapting the code and
tests in questions making it a non-trivial and time consuming task.

The ASF bylaws require PMCs to deliver high quality software which satisfy
certain criteria. Cutting new releases from maintenance branches with known
critical bugs is not compliant with the ASF.

CI is unstable in all maintenance branches making the quality of a release
questionable and merging new PRs rather difficult. Enabling and running it
frequently in all maintenance branches would require a big amount of
resources on top of what we already need for master.

History has shown that it is very difficult or impossible to properly
maintain multiple release branches for Hive.

I think it would be to the best interest of the project if the PMC decided
to drop support for maintenance branches and focused on releasing
exclusively from master.

This mail is related to the discussion about the release cadence [1] since
it would certainly help making Hive releases more regular. I decided to
start a separate thread to avoid mixing multiple topics together.

Looking forward to your thoughts.

Best,
Stamatis

[1] https://lists.apache.org/thread/n245dd23kb2v3qrrfp280w3pto89khxj

[DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-06 Thread Stamatis Zampetakis

Hi all,

The current master has many critical bug fixes as well as important
performance improvements that are not backported (and most likely never
will) to the maintenance branches.

Backporting changes from master usually requires adapting the code and
tests in questions making it a non-trivial and time consuming task.

The ASF bylaws require PMCs to deliver high quality software which satisfy
certain criteria. Cutting new releases from maintenance branches with known
critical bugs is not compliant with the ASF.

CI is unstable in all maintenance branches making the quality of a release
questionable and merging new PRs rather difficult. Enabling and running it
frequently in all maintenance branches would require a big amount of
resources on top of what we already need for master.

History has shown that it is very difficult or impossible to properly
maintain multiple release branches for Hive.

I think it would be to the best interest of the project if the PMC decided
to drop support for maintenance branches and focused on releasing
exclusively from master.

This mail is related to the discussion about the release cadence [1] since
it would certainly help making Hive releases more regular. I decided to
start a separate thread to avoid mixing multiple topics together.

Looking forward for your thoughts.

Best,
Stamatis

[1] https://lists.apache.org/thread/n245dd23kb2v3qrrfp280w3pto89khxj

Re: Is the web broken?

2022-03-15 Thread Stamatis Zampetakis

Hi,

>From the discussion in INFRA-20776, it seems there has been some kind of
communication in the private hive list. If it is possible to share this
information please let us know.

Apart from that please stop including the dev@calcite list in this thread.
I accidentally started that by cc'ing the wrong list in my previous email;
I meant to include dev@hive.

Best,
Stamatis

On Tue, Mar 15, 2022 at 8:54 PM Chao Sun  wrote:

> Hi Gavin,
>
> Do you have any idea why hive.apache.org is broken after the hive-site
> change?
>
> Thanks,
> Chao
>
> On Thu, Mar 10, 2022 at 7:16 PM Chao Sun  wrote:
> >
> > Sorry, we are in the process of migrating to hive-site.git from CMS.
> There are a few issues to be fixed and I’m talking with ASF infra on this.
> >
> > On Thu, Mar 10, 2022 at 6:05 PM 王道远(健身) 
> wrote:
> >>
> >> Hi,
> >>
> >> Could we keep a redirect page at the old place, or a CNAME resolution?
> >>
> >> Best,
> >> Adrian
> >>
> >> --
> >> 发件人：Stamatis Zampetakis 
> >> 发送时间：2022年3月10日(星期四) 18:33
> >> 收件人：user ; dev 
> >> 主 题：Re: Is the web broken?
> >>
> >>
> >> Hi,
> >>
> >> I am not sure if I am missing something but I get the impression [1]
> that the site from now on will be served from here:
> >> https://apache.github.io/hive-site/
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://issues.apache.org/jira/browse/INFRA-20776
> >>
> >> On Thu, Mar 10, 2022 at 10:21 AM Ming  wrote:
> >> I have the same situation
> >>
> >> Ming
> >> shezhimin...@gmail.com
> >> 签名由 网易邮箱大师 定制
> >>
> >> On 03/10/2022 17:12，Pau Tallada wrote：
> >> Is it only me or the https://hive.apache.org/ web is showing a
> directory listing?!
> >>
> >> --
> >> --
> >> Pau Tallada Crespí
> >> Departament de Serveis
> >> Port d'Informació Científica (PIC)
> >> Tel: +34 93 170 2729
> >> --
> >>
>

Re: Hive 3 and Java 11 issue

2022-03-14 Thread Stamatis Zampetakis

I confirm what Pau said, the only supported JDK is 8. Upgrade to JDK 11 has
started [1] but was paused due to various problems. I don't know if anyone
is actively working on fixing this at the moment.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-22415

On Thu, Mar 10, 2022 at 10:34 AM Bitfox  wrote:

> That sounds bad. All our apps are running on JDK 11.
>
> On Thu, Mar 10, 2022 at 5:06 PM Pau Tallada  wrote:
>
>> I think only JDK8 is supported yet
>>
>> Missatge de Bitfox  del dia dj., 10 de març 2022 a
>> les 2:39:
>>
>>> my java version:
>>>
>>> openjdk version "11.0.13" 2021-10-19
>>>
>>>
>>> I can't run hive 3.1.2.
>>>
>>> The error include:
>>>
>>>
>>> Exception in thread "main" java.lang.ClassCastException: class
>>> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class
>>> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader
>>> and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
>>>
>>>
>>> So I am asking Hive 3 doesn't support java 11 yet?
>>>
>>>
>>> Thanks.
>>>
>>
>>
>> --
>> --
>> Pau Tallada Crespí
>> Departament de Serveis
>> Port d'Informació Científica (PIC)
>> Tel: +34 93 170 2729
>> --
>>
>>

Re: Is the web broken?

2022-03-10 Thread Stamatis Zampetakis

Hi,

I am not sure if I am missing something but I get the impression [1] that
the site from now on will be served from here:
https://apache.github.io/hive-site/

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/INFRA-20776

On Thu, Mar 10, 2022 at 10:21 AM Ming  wrote:

> I have the same situation
>
> Ming
> shezhimin...@gmail.com
>
> 
> 签名由 网易邮箱大师  定制
>
> On 03/10/2022 17:12，Pau Tallada  wrote：
>
> Is it only me or the https://hive.apache.org/ web is showing a directory
> listing?!
>
> --
> --
> Pau Tallada Crespí
> Departament de Serveis
> Port d'Informació Científica (PIC)
> Tel: +34 93 170 2729
> --
>
>

Re: Release plans (relative to Log4j fixes/upgrades)

2022-03-02 Thread Stamatis Zampetakis

Hi Brent,

There are plans [1] to release 4.0.0-alpha-1 and 3.1.3 but no exact dates
yet. I don't know anything about 2.3.X.

Best,
Stamatis

[1] https://lists.apache.org/thread/xyvttddcjhk9ffg242wn1wkggsghpc5c

On Tue, Mar 1, 2022 at 9:36 PM Brent  wrote:

> Hi everyone,
>
> I've been trying to go through Jira issues and mailing list archives to
> understand ongoing plans for Log4j fixes/upgrades.  Assuming I have been
> digging properly, it seems like:
>
> Patched to Log4j 2.17.1 in 4.0.0 (unreleased):
>
>- https://issues.apache.org/jira/browse/HIVE-25839
>
> Patched to log4j 2.16.0 in release 3.1.3 (unreleased):
>
>- https://issues.apache.org/jira/browse/HIVE-25810
>
> Patched to log4j 2.17.0 in release 2.3.10 (unreleased):
>
>- https://issues.apache.org/jira/browse/HIVE-25824
>
>
> A couple questions:
> - Does that seem accurate?
> - Are there expected release dates for any of those product lines?
>
> Thank you for all the hard work you all put in on Hive.
>
> ~Brent
>

Re: [DISCUSS] Properties for scheduling compactions on specific queues

2022-02-07 Thread Stamatis Zampetakis

Thanks Janos for the feedback.

If I understand well your suggestion is support all of the properties below
for table level compactions and treat them as equivalent:
* compactor.mapred.job.queue.name
* compactor.mapreduce.job.queuename
* compactor.hive.compactor.job.queue

It is something that crossed my mind as well but I am slightly skeptical
because like this we explicitly state that people are free to use whatever
they like. It might also have as a consequence MR properties affecting Tez
(as it happens a bit with HIVE-25595) which from my perspective is not that
great. I am also thinking that it will lead to more requests for accepting
these MR specific properties in the query based compactor which cannot (and
probably will never) use MR as the underlying engine. We should also keep
in mind that the MR engine was deprecated ~6years ago and the MR compactor
may follow soon.

I am fine implementing this specific change (accepting all properties
above) as long as someone from the people contributing to the compactor
confirms it is the desired path going forward.

Best,
Stamatis


On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs  wrote:

> Hi Stamatis,
>
> I agree that the [compactor.]*hive.compactor.queue.name
> <http://hive.compactor.queue.name>* is a better solution as hive now also
> supports query based compaction, not only MR.
> ...although I think this needs to be backward compatible!
>
> What do you think about a logic similar to this:
>
> --- a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> 2022-02-07 10:31:28.0 +0100
> +++ b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> 2022-02-07 10:33:25.0 +0100
> @@ -145,10 +145,19 @@
>  overrideMRProps(job, t.getParameters()); // override MR properties from 
> tblproperties if applicable
>  if (ci.properties != null) {
>overrideTblProps(job, t.getParameters(), ci.properties);
>  }
>
> +// make queue configuration backward compatible
> +// at that point overrideMRProps and OverrideTblProps already 
> consolidated
> +// the final value, just need to use job.TBALE_PROPS
> +String queueNameLegacy =
> +  (new 
> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("compactor.mapred.job.queue.name");
> +if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
> +  job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
> +}
> +
>  String queueName = HiveConf.getVar(job, ConfVars.COMPACTOR_JOB_QUEUE);
>  if (queueName != null && queueName.length() > 0) {
>job.setQueueName(queueName);
>  }
>
>
> Of course this can be wrapped around with a new config if needed, like
> hive.compaction.queue.name.use.legacy or whatever...
> FYI: we might also want to check legacy config not only for 
> *"compactor.mapred.job.queue.name
> <http://compactor.mapred.job.queue.name>"* but also for
> *"compactor.mapreduce.job.queuename" *as the first one was already on the
> deprecated list as pointed out by Peter Vary.
>
> Please also note that the change introduced by HIVE-25595 is currently not
> compatible with the new config as it was developed for the old
> compactor.mapred... property:
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31
> This also needs to be handled - for both the new prop name and backward
> compatibility.
>
> R, Janos
>
>
> On 2022/01/31 09:50:49 Stamatis Zampetakis wrote:
> > Hi all,
> >
> > This email is an attempt to converge on which Hive/Tez/MR properties
> > someone should use in order to schedule a compaction on specific queues.
> > For those who are not familiar with how queues are used the YARN capacity
> > scheduler documentation [1] gives the general idea.
> >
> > Using specific queues for compaction jobs is necessary to be able to
> > efficiently allocate resources for maintenance tasks (compaction) and
> > production workloads. Hive provides various ways to control the queues
> used
> > by the compactor and there have been various tickets with improvements
> and
> > fixes in this area (see list below).
> >
> > The granularity we can select queues for compactions (all tables vs. per
> > table) currently depends on which compactor is in use (MR vs Query based)
> > and boils down to the following properties:
> >
> > Global configuration:
> > * hive.compactor.job.queue
> > * mapred.job.queue.name
> > * tez.queue.name
> >
> > Per table/statement configuration (table properties):
> > * compactor.mapred.job.queue.name (before HIVE-20723)
> > * co

[DISCUSS] Properties for scheduling compactions on specific queues

2022-01-31 Thread Stamatis Zampetakis

Hi all,

This email is an attempt to converge on which Hive/Tez/MR properties
someone should use in order to schedule a compaction on specific queues.
For those who are not familiar with how queues are used the YARN capacity
scheduler documentation [1] gives the general idea.

Using specific queues for compaction jobs is necessary to be able to
efficiently allocate resources for maintenance tasks (compaction) and
production workloads. Hive provides various ways to control the queues used
by the compactor and there have been various tickets with improvements and
fixes in this area (see list below).

The granularity we can select queues for compactions (all tables vs. per
table) currently depends on which compactor is in use (MR vs Query based)
and boils down to the following properties:

Global configuration:
* hive.compactor.job.queue
* mapred.job.queue.name
* tez.queue.name

Per table/statement configuration (table properties):
* compactor.mapred.job.queue.name (before HIVE-20723)
* compactor.hive.compactor.job.queue (after HIVE-20723)

Things are a bit blurred with respect to what properties someone should use
to achieve the desired result. Some changes, such as HIVE-20723, raise
backward compatibility concerns and other changes seem to have a larger
impact than the one specifically designed for. For example, after
HIVE-25595, map reduce queue properties can have an impact on the compactor
queues even when Tez is in use.

In order to avoid confusion and ensure long term support of these queue
selection features we should clarify which of the above properties should
be used.

Given the current situation, I would propose to officially support only the
following:
* hive.compactor.job.queue
* compactor.hive.compactor.job.queue
and align the implementation based on these (if necessary). In other words,
Hive users should not use mapred.job.queue.name and tez.queue.name
explicitly at least when it comes to the compactor. Hive should set them
transparently (as it happens now in various places) based on
[compactor.]hive.compactor.job.queue.

What do people think? Are there other ideas?

Best,
Stamatis

[1]
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

HIVE-11997: Add ability to send Compaction Jobs to specific queue
HIVE-13354: Add ability to specify Compaction options per table and per
request
HIVE-20723: Allow per table specification of compaction yarn queue
HIVE-24781: Allow to use custom queue for query based compaction
HIVE-25801: Custom queue settings is not honoured by Query based compaction
StatsUpdater
HIVE-25595: Custom queue settings is not honoured by compaction StatsUpdater

Re: Time to Remove Hive-on-Spark

2022-01-28 Thread Stamatis Zampetakis

Hi team,

Almost one year has passed since the last exchange in this discussion and
if I am not wrong there has been no effort to revive Hive-on-Spark. To be
more precise, I don't think I have seen any Spark related JIRA for quite
some time now and although I don't want to rush into conclusions, there
does not seem to be any community member involved in maintaining or adding
new features in this part of the code.

Keeping dead code in the repository does not do any good to the project and
puts a non-negligible burden to future maintainers.

Clearly, we cannot make a new Hive release where a major feature is
completely untested so either someone commits to re-enable/fix the
respective tests soon or we move forward the work started by David and drop
support for Hive-on-Spark.

I would like to ask the community if there is anyone who can take up this
maintenance task and enable/fix Spark related tests in the next month or so?

Best,
Stamatis

On Sat, Feb 27, 2021 at 4:17 AM Edward Capriolo 
wrote:

> I do not know how it works for most of the world. But in cloudera where the
> TEZ options were never popular hive-on-spark represents a solid way to get
> things done for small datasets lower latency.
>
> As for the spark adoption. You know a while ago I came up with some ways to
> make hive more  spark like. One of them was a found a way to make "compile"
> a hive keyword so folks could build UDFs on the fly. It was such an
> uphil climb. Folks found a way to make it disabled by default for security.
> Then later when things moved from CLI to beeline it was like the ONLY thing
> that I found not ported. Like it was extremely frustrating.
>
>
>
>
>
>
> On Mon, Jul 27, 2020 at 3:19 PM David  wrote:
>
> > Hello  Xuefu,
> >
> > I am not part of the Cloudera Hive product team,  though I volunteer to
> > work on small projects from time to time.  Perhaps someone from that team
> > can chime in with some of their thoughts, but personally, I think that in
> > the long run, there will be more of a merge between Hive-on-Spark and
> other
> > Spark-native offerings.  I'm not sure what the differentiation will be
> > going forward.  With that said, are there any developers on this mailing
> > list who are willing to take on the maintenance effort of keeping HoS
> > moving forward?
> >
> > http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/
> >
> >
> https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/config-sts.html
> >
> >
> > Thanks.
> >
> > On Thu, Jul 23, 2020 at 12:35 PM Xuefu Zhang  wrote:
> >
> > > Previous reasoning seemed to suggest a lack of user adoption. Now we
> are
> > > concerned about ongoing maintenance effort. Both are valid
> > considerations.
> > > However, I think we should have ways to find out the answers.
> Therefore,
> > I
> > > suggest the following be carried out:
> > >
> > > 1. Send out the proposal (removing Hive on Spark) to users including
> > > user@hive.apache.org and get their feedback.
> > > 2. Ask if any developers on this mailing list are willing to take on
> the
> > > maintenance effort.
> > >
> > > I'm concerned about user impact because I can still see issues being
> > > reported on HoS from time to time. I'm more concerned about the future
> of
> > > Hive if we narrow Hive neutrality on execution engines, which will
> > possibly
> > > force more Hive users to migrate to other alternatives such as Spark
> SQL,
> > > which is already eroding Hive's user base.
> > >
> > > Being open and neutral used to be Hive's most admired strengths.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > >
> > > On Wed, Jul 22, 2020 at 8:46 AM Alan Gates 
> wrote:
> > >
> > > > An important point here is I don't believe David is proposing to
> remove
> > > > Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing
> > to
> > > > support it in existing 2 and 3 lines makes sense, but since no one
> has
> > > > maintained it on trunk for some time and it does not work with many
> of
> > > the
> > > > newer features it should be removed from trunk.
> > > >
> > > > Alan.
> > > >
> > > > On Tue, Jul 21, 2020 at 4:10 PM Chao Sun  wrote:
> > > >
> > > > > Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a
> > > very
> > > > > large scale in production right now and I don't think we have any
> > plan
> > > to
> > > > > change it soon.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 21, 2020 at 11:28 AM David  wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Thanks for the feedback.
> > > > > >
> > > > > > Just a quick recap: I did propose this @dev and I received
> > unanimous
> > > > +1's
> > > > > > from the community.  After a couple months, I created the PR.
> > > > > >
> > > > > > Certainly open to discussion, but there hasn't been any
> discussion
> > > thus
> > > > > far
> > > > > > because there have been no objections until this point.
> > > > > >
> > > > > > HoS has low adoption, heavy technical debt, and the manner in
>

Re: I can't mvn clean package

2021-10-18 Thread Stamatis Zampetakis

Hi igyu,

>From some paths appearing in the stack trace it seems you are running on
Windows.
I don't know of anybody developing Hive on Windows so I am not
surprised the build fails.
Build should run fine on Linux and Mac environments.

I just tried running the package goal (see below) and it finishes without
problems on my machine (Ubuntu 20.04.3 LTS).

commit 08af0fc67c21dea253349c6486a30675b5eead26
mvn clean package -DskipTests

Best,
Stamatis

On Thu, Oct 14, 2021 at 7:23 AM igyu  wrote:

> I can't "mvn clean package"
> I get a error
>
>
> [INFO] Hive Standalone Metastore Common Code 4.0.0-SNAPSHOT FAILURE [
> 1.446 s]
>
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-antrun-plugin:1.8:run
> (generate-version-annotation) on project hive-standalone-metastore-common:
> An Ant BuildException has occured: exec returned: 1
> [ERROR] around Ant part ..
> @ 4:46 in
> D:\file\code\Java\hive\standalone-metastore\metastore-common\target\antrun\build-main.xml
> [ERROR] -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run
> (generate-version-annotation) on project hive-standalone-metastore-common:
> An Ant BuildException has occured: exec returned: 1
> around Ant part .. @ 4:46
> in
> D:\file\code\Java\hive\standalone-metastore\metastore-common\target\antrun\build-main.xml
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute
> (MojoExecutor.java:215)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute
> (MojoExecutor.java:156)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute
> (MojoExecutor.java:148)
> at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
> (LifecycleModuleBuilder.java:117)
> at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
> (LifecycleModuleBuilder.java:81)
> at
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
> (SingleThreadedBuilder.java:56)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute
> (LifecycleStarter.java:128)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke
> (NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke
> (DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke (Method.java:498)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced
> (Launcher.java:282)
> at org.codehaus.plexus.classworlds.launcher.Launcher.launch
> (Launcher.java:225)
> at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode
> (Launcher.java:406)
> at org.codehaus.plexus.classworlds.launcher.Launcher.main
> (Launcher.java:347)
> Caused by: org.apache.maven.plugin.MojoExecutionException: An Ant
> BuildException has occured: exec returned: 1
> around Ant part .. @ 4:46
> in
> D:\file\code\Java\hive\standalone-metastore\metastore-common\target\antrun\build-main.xml
> at org.apache.maven.plugin.antrun.AntRunMojo.execute
> (AntRunMojo.java:342)
> at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo
> (DefaultBuildPluginManager.java:137)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute
> (MojoExecutor.java:210)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute
> (MojoExecutor.java:156)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute
> (MojoExecutor.java:148)
> at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
> (LifecycleModuleBuilder.java:117)
> at
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject
> (LifecycleModuleBuilder.java:81)
> at
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
> (SingleThreadedBuilder.java:56)
> at org.apache.maven.lifecycle.internal.LifecycleStarter.execute
> (LifecycleStarter.java:128)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
> at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
> at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
> at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
> at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
> at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
> at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke
>

Re: [EXTERNAL] Re: Move Date and Timestamp parsing from ResolverStyle.LENIENT to ResolverStyle.STRICT

2021-07-21 Thread Stamatis Zampetakis

I am +1 on returning NULL for seemingly "invalid" dates/timestamps.
Invalid may not be the most appropriate term since the parsing adheres to
JDK APIs but indeed the results may seem surprising.

I guess we can merge this PR when all the comments are addressed if nobody
raises a concern in the meantime.

Best,
Stamatis

On Tue, Jul 20, 2021 at 5:04 PM Sankar Hariappan <
sankar.hariap...@microsoft.com> wrote:

> +1
>
> Thanks Ashish for the comparison!
>
> I talked to few Hive users (HDInsight) and they supported returning NULL
> for invalid date/timestamp inputs instead of returning incorrect results or
> exception.
>
> Can others pls share your thoughts?
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Ashish Sharma 
> *Sent:* 20 July 2021 14:02
> *To:* d...@hive.apache.org
> *Cc:* sankar.hariap...@microsoft.com.invalid; sank...@apache.org;
> user@hive.apache.org; David 
> *Subject:* Re: [EXTERNAL] Re: Move Date and Timestamp parsing from
> ResolverStyle.LENIENT to ResolverStyle.STRICT
>
>
>
> Hi all,
>
>
>
> I also feel that adding more config doesn't make sense in this as we are
> tightening the date and timestamp format. We should decide upon a single
> solution even if it break the compatibility. Below the comparison of HIVE
> 1.2, HIVE 3.2, MYSQL, PostgreSQL, Oracle
>
>
>
>
>
> *Query*
>
> *Hive 1.2*
>
> *Hive 3.2*
>
> *Mysql*
>
> *PostgreSQL*
>
> *ORACLE *
>
> select cast('2020-20-20' as date);
>
> NULL
>
> 2021-08-20
>
> NULL
>
> date/time field value out of range: "2020-20-20"
>
> not a valid month
>
> select cast(null as date);
>
> NULL
>
> NULL
>
> NULL
>
> NULL
>
> NULL
>
> select cast('2020-02-31' as date);
>
> 2020-03-02
>
> 2020-03-02
>
> NULL
>
> date/time field value out of range: "2020-02-31"
>
> date format picture ends before converting entire input string
>
> select cast('2020/02/20' as date);
>
> NULL
>
> NULL
>
> 2020-02-20
>
> 2020-02-20
>
> literal does not match format string
>
> select cast('-00-00' as date);
>
> NULL
>
> 0002-11-30
>
> NULL
>
> date/time field value out of range: "-00-00"
>
> literal does not match format string
>
>
>
>
>
> From the comparison it is quite clear that date and timestamp formatting
> was much tighter in older versions of HIVE. For most of the wrong date
> input *NULL *was the standard response instead of Exception.
>
>
>
> Also when I went through the code I found that. While doing the Vector
> implementation of some of the date related UDF like datediff etc. MySql was
> taken as the gold standard
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-15338%3FfocusedCommentId%3D15727553%26page%3Dcom.atlassian.jira.plugin.system.issuetabpanels%253Acomment-tabpanel%23comment-15727553=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537409104%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=UjkFmuO8PHHkcojm1f%2FPJipguQ1JYMkbl%2F0XzcrvGBg%3D=0>.
> So it make more sense that  we should comply with MySql as we already refer
> MySql as gold standard and returning NULL as result for wrong dates in cast
> is also documented
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fhive%2Flanguagemanual%2Btypes%23LanguageManualTypes-CastingDates=04%7C01%7CSankar.Hariappan%40microsoft.com%7C5546147ca1b1489c558c08d94b58e75b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637623667537419060%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=69O6Ct8q%2FK8l31D4yK4eF1fQRFjtl3jAiooP3FWJTJ4%3D=0>
>
>
>
>
> *So I propose to make NULL as the standard response for all parsing
> errors.*
>
>
>
> Thanks
>
> Ashish Sharma
>
>
>
> On Tue, Jul 13, 2021 at 9:52 PM Stamatis Zampetakis 
> wrote:
>
> Hi all,
>
> Thanks for pushing this forward Ashish!
>
> Actually I am not in favor of creating a flag for this. Either we decide
> consciously to break backward compatibility in the hope that we are
> improving the expected results or we keep the current behavior.
> Adding another flag means that we maintain and support two variants that
> makes the problem of test coverage brought by David even worse.
>
> I second David's idea to run some tests over some well adopted DBMS (MySQL,
> Oracle, MSSQL, Postgres) to see what they return.
> I think Ashish already did some

Re: [EXTERNAL] Re: Move Date and Timestamp parsing from ResolverStyle.LENIENT to ResolverStyle.STRICT

2021-07-13 Thread Stamatis Zampetakis

Hi all,

Thanks for pushing this forward Ashish!

Actually I am not in favor of creating a flag for this. Either we decide
consciously to break backward compatibility in the hope that we are
improving the expected results or we keep the current behavior.
Adding another flag means that we maintain and support two variants that
makes the problem of test coverage brought by David even worse.

I second David's idea to run some tests over some well adopted DBMS (MySQL,
Oracle, MSSQL, Postgres) to see what they return.
I think Ashish already did some tests over MySQL and MSSQL but personally I
would like to see some more (dates + engines) in order to express
a preference.
We shouldn't forget that since Hive is implemented in Java, having
functions that are inline with the Java APIs is not such a bad idea.
The last comment is slightly supportive of the current behavior.

I am including user@ list in the discussion since we should definitely
consider the feedback of people that are using Hive for real.

Best,
Stamatis

On Tue, Jul 13, 2021 at 4:31 PM David  wrote:

> Hello,
>
> Is anyone able to try out a few different vendor RDBMS to see how they
> handle invalid dates, or provide links to documentation, both for invalid
> formatting and things like mm-dd-yyy 12-40-2021?
>
> Thanks.
>
> On Tue, Jul 13, 2021 at 5:14 AM Sankar Hariappan
>  wrote:
>
>> I'm supporting this change to return "NULL" for invalid date/timestamp.
>> In the interest of backward compatibility, can we make all these changes
>> under a flag which can be enabled by default?
>>
>>
>> Thanks,
>> Sankar
>> -Original Message-
>> From: David 
>> Sent: 10 July 2021 07:35
>> To: dev 
>> Cc: sank...@apache.org; Stamatis Zampetakis 
>> Subject: [EXTERNAL] Re: Move Date and Timestamp parsing from
>> ResolverStyle.LENIENT to ResolverStyle.STRICT
>>
>> Hello,
>>
>> I too would be in favor of this. It drastically cuts down on the test
>> matrix for Hive if we can clamp down on timestamp formats. With that being
>> said, I've tried this and it's a big effort.  I put it down without getting
>> consensus or buy-in or engagement on the effort. Please check out my work
>> here:
>>
>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fplugins%2Fservlet%2Fmobile%23issue%2FHIVE-24814data=04%7C01%7CSankar.Hariappan%40microsoft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=mvYgaG7liJOwUZmMvgwlo%2B1HvcUsrnzXA3Ltfz5yEYE%3Dreserved=0
>>
>>
>> On Fri, Jul 9, 2021, 9:49 PM Ashish Sharma > >
>> wrote:
>>
>> > Hi,
>> >
>> > When casting incorrect date or timestamp literals to DATE or TIMESTAMP
>> > data type hive returns wrong values
>> >
>> > hive> select cast('2020-20-20' as date);
>> >
>> > OK
>> >
>> > 2021-08-20
>> >
>> > Time taken: 4.436 seconds, Fetched: 1 row(s)
>> >
>> >
>> > I have created a solution draft. Please review the draft and provide
>> > your valuable feedback on the same.
>> >
>> >
>> >
>> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs
>> > .google.com%2Fdocument%2Fd%2F1YTTPlNq3qyzlKfYVkSl3EFhVQ6-wa9WFRdkdIeCo
>> > c1Y%2Fedit%3Fusp%3Dsharingdata=04%7C01%7CSankar.Hariappan%40micro
>> > soft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf86f141af91ab2d7c
>> > d011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
>> > MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000
>> > sdata=iihK9wJC%2B1uPktHSE9BpXADvbal1UT7vZ3rwigkgkIY%3Dreserved=0
>> >
>> >
>> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu
>> > es.apache.org%2Fjira%2Fbrowse%2FHIVE-25306data=04%7C01%7CSankar.H
>> > ariappan%40microsoft.com%7Cd47432b9d7654d66a46908d943472338%7C72f988bf
>> > 86f141af91ab2d7cd011db47%7C1%7C0%7C637614795446338436%7CUnknown%7CTWFp
>> > bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
>> > 0%3D%7C3000sdata=nW%2Bw%2B0AYn%2BbvOqRLrXghfH0FG%2B1cQW%2BzdrpT%2
>> > B9R%2B6rA%3Dreserved=0
>> >
>> >
>> > Thank you
>> >
>> > Ashish Sharma
>> >
>>
>

Re: Failed to compile apache hive tag - rel/release-2.3.8

2021-06-15 Thread Stamatis Zampetakis

Hi Apurwa,

I think it is related to HIVE-25173 [1].
The pentaho-aggdesigner-algorithm artifacts were present only in the spring
repo and not in maven central. I think the spring repo was retired so it is
not possible to find the artifacts any more.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25173

On Tue, Jun 15, 2021 at 12:18 AM Apurwa Jadhav  wrote:

> Hello all,
>
> I’ve been trying to build hive from source for the tag - red/release-2.3.8
> and I’m running into the following error when I run mvn -U clean package -
> Phadoop-2,dist -DskipTests -DskipITs.
>
>
> [ERROR] Failed to execute goal on project hive-exec: Could not resolve
> dependencies for project org.apache.hive:hive-exec:jar:2.3.8: Could not
> find artifact org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.5-jhyde in
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
>
>
> Using maven-3/3.6.3
>
>
> I referred to
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration
>
>
> Thanks,
> Apurwa Jadhav
>

Re: Running Hive 3.1.2 embedded in JVM for testing

2021-02-19 Thread Stamatis Zampetakis

Hi James,

I am doing something similar with the difference that everything runs on
docker [1].
I am using Hive 3.1 (HDP though) but things work fine at least with
in-memory derby.


javax.jdo.option.ConnectionURL
jdbc:derby:memory:metastore;create=true


Best,
Stamatis

[1] https://github.com/zabetak/hs2-embedded

On Wed, Feb 17, 2021 at 10:55 PM James Baiera 
wrote:

> Hey folks,
>
> I have a project where I test with Hive using an embedded HiveServer2
> instance within a JVM running integration tests. This has worked for Hive
> 1.2.2 in the past, and I've been able to get it to work with Hive 2.3.8,
> but have been having trouble getting it working on Hive 3.0+
>
> The error I keep running into is that the metastore tables are not present
> in the local embedded metastore. I have enabled both
> "hive.metastore.schema.verification" to be "false" and
> "datanucleus.schema.autoCreateAll" to be "true", but it seems like the
> latter setting is being ignored. Instead of starting up, the HiveServer2
> fails while trying to read from the DBS table:
>
> Self-test query [select "DB_ID" from "DBS"] failed; direct SQL is disabled
> javax.jdo.JDODataStoreException: Error executing SQL query "select "DB_ID"
> from "DBS"".
>  at
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
> ~[datanucleus-api-jdo-4.2.4.jar:?]
>  at
> org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391)
> ~[datanucleus-api-jdo-4.2.4.jar:?]
>  at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
> ~[datanucleus-api-jdo-4.2.4.jar:?]
>  at
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.runTestQuery(MetaStoreDirectSql.java:276)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:184)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:498)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:420)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:375)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
> [hadoop-common-3.1.2.jar:?]
>  at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
> [hadoop-common-3.1.2.jar:?]
>  at
> org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:59)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:718)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:696)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:690)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:767)
> [hive-exec-3.1.2.jar:3.1.2]
>  at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:538)
> [hive-exec-3.1.2.jar:3.1.2]
>  <>
>
> Looking into the documentation, it seems that many things mention using
> the schematool to set up the metastore the first time, but this is an
> embedded use case, and there is no Hive installation locally to use for
> this.
>
> I've also tried using the HiveJDBC driver with "jdbc:hive2:///" as the url
> to run on an embedded server, and I am getting the same errors.
>
> Is this use case not supported anymore in Hive 3? Am I missing something
> here?
>

Re: Hive SQL extension

2020-10-26 Thread Stamatis Zampetakis

I do like extensions and things that simplify our life when
writing queries.

Regarding the partitioning syntax for Iceberg, there may be better
alternatives.
I was also leaning towards a syntax like the one proposed by Jesus (in
another thread) based on virtual columns, which is also part of SQL
standard.

Regarding the other use cases mentioned (temporal queries, time travel
etc.) there are things that are part of SQL standard so we could start from
there and then introduce extensions if needed.

Syntactic sugar is powerful but in terms of design I find it more
appropriate to perform "desugaring" after having an AST; either AST to AST
transformations or afterwards.
The syntax (sugared or not) is part and responsibility of the parser so an
architecture with sub-parser hooks seems a bit brittle, especially if we
start using it extensively.
Having said that you have thought of this much more than I did so maybe the
hook's approach is a better idea after all :)

Best,
Stamatis

On Fri, Oct 23, 2020 at 2:26 PM Pau Tallada  wrote:

> Hi all,
>
> I do not know if that may be of interest to you, but there are other
> projects that could benefit from this.
> For instance, ADQL
> <https://www.ivoa.net/documents/ADQL/20180112/PR-ADQL-2.1-20180112.html>
> (Astronomical Data Query Language) is a SQL-like language that defines some
> higher-level functions that enable powerful geospatial queries. Projects
> like queryparser <https://github.com/aipescience/queryparser> are able to
> translate from ADQL to vendor-SQL for MySQL or PostreSQL. In this case, the
> syntactic sugar is implemented as an external layer on top, but could very
> well be implemented in a rewrite hook if available.
>
> Cheers,
>
> Pau.
>
> Missatge de Peter Vary  del dia dj., 22 d’oct. 2020 a
> les 16:21:
>
>>
>> Let's assume that this feature would be useful for Iceberg tables, but
>> useless and even problematic/forbidden for other tables. :)
>>
>> My thinking is, that it could make Hive much more user friendly, if we
>> would allow for extensions in language.
>>
>> With Iceberg integration we plan to do several extensions which might not
>> be useful for other tables. Some examples:
>>
>>- When creating tables we want to send additional information to the
>>storage layer, and pushing everything in properties is a pain (not really
>>user friendly)
>>- We would like to allow querying table history for iceberg tables
>>(previous snapshotId-s, timestamps, etc)
>>- We would like to allow time travel for iceberg tables based on the
>>data queried above
>>- We would like to allow the user to see / manage / remove old
>>snapshots
>>
>>
>> These are all very specific Iceberg related stuff, and most probably will
>> not work / useful for any other type of the tables, so I think adding them
>> to Hive parser would be a stretch.
>>
>> On the other hand if we do not provide SQL interface for accessing these
>> features then the users will turn to Spark/Impala/Presto to be able to work
>> with Iceberg tables.
>>
>> As for your specific question for handling syntax errors (I have just
>> started to think about how would I do it, so feel free to suggest better
>> methods):
>>
>>- Let's assume that we have a hook which can get the sql command as
>>an input and can rewrite it to a new SQL command
>>- I would write simplified parser which tries to be as simple as
>>possible for the specific command
>>- Based on the parsing I would return the same command / throw an
>>exception / rewrite the command
>>
>>
>> Admittedly this solution is working only if we can make every feature
>> work without changing other part of Hive, and we just want to add
>> "syntactic sugar" to it. (Do not underestimate the benefits of syntactic
>> sugar :))
>>
>> Thanks,
>> Peter
>>
>>
>> On Oct 22, 2020, at 11:44, Stamatis Zampetakis  wrote:
>>
>> Hi Peter,
>>
>> I am nowhere near being an expert but just wanted to share my thoughts.
>>
>> If I understand correctly you would like some syntactic sugar in Hive to
>> support partitioning as per Iceberg. I cannot tell if that's really useful
>> or not but from my point of view it doesn't seem a very good idea to
>> introduce another layer of parsing before the actual parser (don't know if
>> there is one already). For instance, how are you gonna handle the situation
>> where there are syntax errors in your sugared part and what the end user
>> should see?
>>
>> No matter how it is added i

Re: Hive SQL extension

2020-10-22 Thread Stamatis Zampetakis

Hi Peter,

I am nowhere near being an expert but just wanted to share my thoughts.

If I understand correctly you would like some syntactic sugar in Hive to
support partitioning as per Iceberg. I cannot tell if that's really useful
or not but from my point of view it doesn't seem a very good idea to
introduce another layer of parsing before the actual parser (don't know if
there is one already). For instance, how are you gonna handle the situation
where there are syntax errors in your sugared part and what the end user
should see?

No matter how it is added if you give the possibility to the user to write
such queries it becomes part of the Hive syntax and as such a job of the
parser.

Best,
Stamatis

On Thu, Oct 22, 2020 at 9:49 AM Peter Vary  wrote:

> Hi Hive experts,
>
> I would like to extend Hive SQL language to provide a way to create
> Iceberg partitioned tables like this:
>
> create table iceberg_test(
> level string,
> event_time timestamp,
> message string,
> register_time date,
> telephone array 
> )
> partition by spec(
> level identity,
> event_time identity,
> event_time hour,
> register_time day
> )
> stored as iceberg;
>
>
> The problem is that this syntax is very specific of Iceberg, and I think
> it is not a good idea to change the Hive syntax globally to accommodate a
> specific use-case.
> The following CREATE TABLE statement could archive the same thing:
>
> create table iceberg_test(
> level string,
> event_time timestamp,
> message string,
> register_time date,
> telephone array 
> )
> STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
> TBLPROPERTIES ('iceberg.mr.table.partition.spec'='...');
>
>
> I am looking for a way to rewrite the original (Hive syntactically not
> correct) query to a new (syntactically correct) one.
>
> I was checking the hooks as a possible solution, but I have found that:
>
>- HiveDriverRunHook.preDriverRun can get the original / syntactically
>not correct query, but I have found no way to rewrite it to a syntactically
>correct one (it looks like a read only query)
>- HiveSemanticAnalyzerHook can rewrite the AST tree, but it needs a
>syntactically correct query to start with
>
>
> Any other ideas how to archive the goals above? Either with Hooks, or with
> any other way?
>
> Thanks,
> Peter
>

Request write access to the Hive wiki for zabetak

2020-06-05 Thread Stamatis Zampetakis

Hello,

Can somebody please give me write access to the Hive wiki?

My username is zabetak

Best,
Stamatis

51 matches

Mail list logo