Re: [VOTE] Mark Hive 2.x EOL

2024-05-10 Thread Stamatis Zampetakis
+1 (binding)

On Fri, May 10, 2024 at 10:10 AM Denys Kuzmenko  wrote:
>
> +1 (binding)


Re: Voice Of Apache interview request

2024-05-04 Thread Stamatis Zampetakis
Hey Rich,

It would be great to have a podcast about Hive and the 4.0 release so
I am happy to volunteer. If there are more people willing to
participate I can definitely cede my place and let others do the
talking.

Best,
Stamatis

On Tue, Apr 30, 2024 at 4:25 PM Rich Bowen  wrote:
>
> I saw the article on news.apache.org about your recent release of 4.0
>
> As you may know, I produce a podcast about Apache projects, at 
> https://feathercast.apache.org/  I'd very much like to do a short interview 
> about the 4.0 release, and about Hive in general, and am looking for a 
> volunteer (or possibly two?) who would be willing to do such an interview.
>
> Please let me know if you're interested in speaking with me on these topics. 
> (Please copy me directly on any responses - rbo...@apache.org - as I am not 
> currently subscribed to this list.)


CVE-2023-35701: Apache Hive: Arbitrary command execution via JDBC driver

2024-05-03 Thread Stamatis Zampetakis
Severity: moderate

Affected versions:

- Apache Hive 4.0.0-alpha-1 before 4.0.0

Description:

Improper Control of Generation of Code ('Code Injection') vulnerability in 
Apache Hive.

The vulnerability affects the Hive JDBC driver component and it can potentially 
lead to arbitrary code execution on the machine/endpoint that the JDBC driver 
(client) is running. The malicious user must have sufficient permissions to 
specify/edit JDBC URL(s) in an endpoint relying on the Hive JDBC driver and the 
JDBC client process must run under a privileged user to fully exploit the 
vulnerability. 

The attacker can setup a malicious HTTP server and specify a JDBC URL pointing 
towards this server. When a JDBC connection is attempted, the malicious HTTP 
server can provide a special response with customized payload that can trigger 
the execution of certain commands in the JDBC client.This issue affects Apache 
Hive: from 4.0.0-alpha-1 before 4.0.0.

Users are recommended to upgrade to version 4.0.0, which fixes the issue.

This issue is being tracked as HIVE-27554 

Credit:

Kostya Kortchinsky (reporter)

References:

https://hive.apache.org/
https://www.cve.org/CVERecord?id=CVE-2023-35701
https://issues.apache.org/jira/browse/HIVE-27554



Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-29 Thread Stamatis Zampetakis
I shared the reasons behind the removal of the jar and my concerns around
bringing it back. I'm still not convinced that it's needed but if the rest
of the community feels that it's the right path forward then I am ok with
this.

Best,
Stamatis

On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena  wrote:

> Stamatis,
> Isn't the removal itself an incompatible change? There are a lot of
> projects using it & we suddenly removed a jar because there were some
> people not sure how to properly use it and were complaining about it.
>
> What about the projects which are now stuck? reading the thread at [1],
> there were promises made that everything will be relocated and sorted
> before the release, but we couldn't, AFAIK it isn't a naive task to just
> relocate all the dependencies.
>
> As I see here @Chao Sun , even raised concerns [2], that the removal just
> stops the way for upgrading downstream projects and it got countered like
> folks chasing the removal will help chase getting all the dependencies
> relocated or solve the issues for downstream. I think none volunteered.
>
> I would either recommend:
> * Best case we relocate all the dependencies present in hive-exec, not
> just one or two. Somebody volunteers to raise one PR relocating "all" and
> we can commit that and we should be sorted.
> * Restore back the core jar, because a lot of projects depend on it, the
> removal itself was incompatible, the removal I don't think had a clear
> community agreement, it was a conditional agreement, which I don't think
> got sorted, so we should rollback.
>
> On a lighter note, we might release with some 5000+ commits, with best
> performance or so, but if nobody is able to consume those release bits, I
> think those efforts are just getting waste, eventually people will just
> stick to their older versions and not even try to upgrade & we will be
> releasing for nobody or maybe for few folks who just have only Hive in
> their stack (I don't know if there are folks like that), No matter how good
> a product is, if people don't use it, it is gonna die :-(
>
>
> I think we have a ticket which talks about relocating all dependencies, I
> agree we should drop the core jar for sure, it leads to all the problems as
> Stamatis mentioned but lets restore the core jar back & we can drop it when
> that relocation ticket is resolved. Does that sound convincing, or even
> worth a thought?
>
> btw. having jars with a set of dependencies shaded and other ones unshaded
> is done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster
> & such problems by users keep on coming, eg [3]
>
> Anyone else, any thoughts?
>
> -Ayush
>
> [1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
> [2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn
> [3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x
>
>
>
> On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis 
> wrote:
>
>> Hey Simhadri, thanks for starting this discussion.
>>
>> Maven has many limitations when it comes to publishing multiple
>> artifacts from the same module. In most cases, the end result is
>> broken and hard to use. The pom file that is published for a given
>> module is not able to describe correctly all artifacts of the module
>> and that's why there is one main artifact for every module; dependency
>> declarations are usually correct for the main artifact but are not
>> representative for the rest.
>>
>> For example, end-users who consume the hive-exec-core module tend to
>> think that maven will automatically resolve all transitive
>> dependencies and things will work as usual which is not the case. In
>> the past, this kind of assumption created a lot of confusion on
>> consumers of the hive-core-exec.jar with tickets and open debates that
>> spanned for multiple months. The discussions even reached a point
>> where people requested certain features of Hive to be reverted in
>> order to rectify some things around transitive dependencies and the
>> core jar.
>>
>> I think we should stick to the usual maven convention and just publish
>> one artifact for each module. Adding back and claiming to support the
>> "core" jar is a step backwards that just postpones the real problems
>> that we need to tackle.
>>
>> Furthermore, I don't think that the hive-exec module was ever meant to
>> be used as a dependency. This is mainly an application module and not
>> a library module and that's why shading takes place. Clearly some
>> parts from hive-exec could be considered to become a library and that
>> would be a promising direction going forward (split

Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-04-26 Thread Stamatis Zampetakis
Hey Simhadri, thanks for starting this discussion.

Maven has many limitations when it comes to publishing multiple
artifacts from the same module. In most cases, the end result is
broken and hard to use. The pom file that is published for a given
module is not able to describe correctly all artifacts of the module
and that's why there is one main artifact for every module; dependency
declarations are usually correct for the main artifact but are not
representative for the rest.

For example, end-users who consume the hive-exec-core module tend to
think that maven will automatically resolve all transitive
dependencies and things will work as usual which is not the case. In
the past, this kind of assumption created a lot of confusion on
consumers of the hive-core-exec.jar with tickets and open debates that
spanned for multiple months. The discussions even reached a point
where people requested certain features of Hive to be reverted in
order to rectify some things around transitive dependencies and the
core jar.

I think we should stick to the usual maven convention and just publish
one artifact for each module. Adding back and claiming to support the
"core" jar is a step backwards that just postpones the real problems
that we need to tackle.

Furthermore, I don't think that the hive-exec module was ever meant to
be used as a dependency. This is mainly an application module and not
a library module and that's why shading takes place. Clearly some
parts from hive-exec could be considered to become a library and that
would be a promising direction going forward (splitting hive-exec into
other modules) but a bit outside the scope of the current discussion.

>From the issues outlined above the only actionable item that I see
concerns the joda library so we could try to simply relocate it if it
is causing issues.

Finally, if someone wants to create a jar with specific contents from
the hive-exec module it is rather easy to do so. I created a small POC
project [1] on how someone can create something similar to the
hive-exec-core.jar and incorporate it in their build. Each project has
separate needs so for such customization I feel that the burden
shouldn't fall on the Hive community.

Best,
Stamatis

[1] https://github.com/zabetak/hive-core-poc

On Thu, Apr 25, 2024 at 11:12 AM Simhadri G  wrote:
>
> Hi Everyone,
>
> The hive-exec:core jar is used by spark, oozie, hudi and many other projects. 
> Removal of the hive-exec:core jar has caused the following issues.
>
> Spark : https://lists.apache.org/list?dev@hive.apache.org:lte=1M:joda
> Oozie: https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg
> Hudi: apache/hudi#8147
> Apache IotDB: https://lists.apache.org/thread/wdqsyj89w9cvyk1pyxr83hlxpg6zp1go
> Guava: https://github.com/google/guava/issues/
> joda-time: https://lists.apache.org/thread/sphgcvod3qx9wtc51ltpfyr8dpx9p294
>
> I understand that there is prior discussion about why the hive-exec:core jar 
> was removed here:
> https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg
>
> We agreed that ultimately hive-exec jar should be used over hive-exec:core 
> but there are quite a few dependencies that need to be shaded and relocated 
> for this.  https://issues.apache.org/jira/browse/HIVE-26220 .
>
> Until we shade & relocate dependencies in hive-exec, we should restore the 
> hive-exec:core jar . The intention for this is to provide a smoother 
> transition from the hive-exec:core to hive-exec jar for projects that depend 
> on hive .
>
> Seeking inputs from the community  and a way to move forward on this topic.
>
> I apologize in advance if I have missed anything.
>
> Thanks!
>
> Simhadri G


Re: [VOTE] Apache Hive 2.3.10 Release Candidate 0

2024-04-24 Thread Stamatis Zampetakis
Hello,

I am not sure how to interpret the comments about HIVE-28121. Are we
waiting for HIVE-28121 to be merged for 2.3.10 or this release can
proceed even without it?

Best,
Stamatis

On Mon, Apr 22, 2024 at 10:37 AM Cheng Pan  wrote:
>
> I made integration tests with Apache Spark[1] and Apache Kyuubi[2], and 
> everything looks good so far.
>
> We may need to wait for HIVE-28121(affects HMS)[3] and prepare for the next 
> RC.
>
> Thanks to Chao for your efforts in making this release.
>
> [1] https://github.com/apache/spark/pull/45372
> [2] https://github.com/apache/kyuubi/pull/6328
> [3] https://github.com/apache/hive/pull/5204
>
> Thanks,
> Cheng Pan
>
>


Re: [DISCUS] Plan the next Hive release

2024-04-18 Thread Stamatis Zampetakis
There are also many projects that never create minor version releases;
it's up to each project to decide what fits best on each occasion.

I am not against minor releases nor suggest that this should be the
way to go for every release from now onwards. I am just saying that at
this point in time I don't see a big benefit to release from side
branches.

Again the motivation for releasing early and often from master is that
it has less maintenance overhead for the community and the end-users
can benefit from all improvements as soon as possible. Certainly if we
introduce breaking changes and big risky features this approach cannot
work.

Anyways, I am glad that we are having this discussion and it's also
very positive that we are talking about a new release in less than a
month since 4.0.0 came out. No matter if it is 4.0.1 or 4.1.0 I am
fully onboard and happy to help as much as I can :)

Best,
Stamatis

On Thu, Apr 18, 2024 at 11:53 AM Denys Kuzmenko  wrote:
>
> Hi Stamatis,
>
> That is the standard practice to create minor version release for bugfixes. 
> Many upstream projects follow that same strategy, check Iceberg for example.
>
> Regards,
> Denys
>
> On 2024/04/18 07:49:59 Stamatis Zampetakis wrote:
> > The 4.0.0 release was quite recent so I assume we don't have major
> > breaking changes in there at the moment so we could cut the release
> > directly from master as soon as we want. HIVE-28166 is already merged
> > so we could aim to cut 4.1.0 as soon as HIVE-28190 goes in.
> >
> > The experience shows that we are not very good at maintaining multiple
> > release branches so in general I would prefer to focus on releasing
> > only from master for the time being. Hive is a quite mature project so
> > in principle breaking changes should be rather rare which gives us a
> > bit of margin. I think a scheme where we backport less and release
> > more is preferable.
> >
> > Best,
> > Stamatis
> >
> > On Wed, Apr 17, 2024 at 9:56 AM Ayush Saxena  wrote:
> > >
> > > Hi Stamatis,
> > > The plan is to have a release line cut from the branch-4.0, So, we plan 
> > > to pull in some critical bug fixes & improvements into the 4.0.1 release 
> > > and have a quicker release.
> > > As of now we are just putting the label "hive-4.0.1-must" on the tickets 
> > > and we plan to make sure those get c-picked to the release line. AFAIK we 
> > > haven't started committing to any branch yet, was waiting if anyone feels 
> > > differently, so we can hold back if you have concerns or take a different 
> > > approach as well.
> > >
> > > From CI you mean to say the daily builds? else if you create a PR 
> > > targeting to branch-4.0, it will run the entire test suite I believe? In 
> > > the meantime I will update the instructions regarding the target branch & 
> > > the label if anyone wants that a particular ticket to be part of the 
> > > 4.0.1 release.
> > >
> > > -Ayush
> > >
> > > On Wed, 17 Apr 2024 at 12:42, Stamatis Zampetakis  
> > > wrote:
> > >>
> > >> Thanks for starting the discussion Ayush.
> > >>
> > >> Having frequent releases is definitely needed so we should keep the
> > >> momentum going.
> > >>
> > >> I had the impression from other threads that the next Hive release
> > >> would be 4.1.0 and that it would be cut from master. I would like to
> > >> understand how 4.0.1 is different and if it is, what is the
> > >> contribution pattern that contributors and committers should follow?
> > >> If the idea is to maintain and commit in two (or more) branches the
> > >> steps should be documented and CI should be running on those branches.
> > >>
> > >> Best,
> > >> Stamatis
> > >>
> > >> On Wed, Apr 10, 2024 at 1:18 PM Denys Kuzmenko  
> > >> wrote:
> > >> >
> > >> > We might need it sooner as identified some critical issues in the 
> > >> > recent code:
> > >> > 1. HIVE-28166: Truncate on Iceberg table disregards the branch name 
> > >> > and operates on a main;
> > >> > 2. HIVE-28190: Materialized view rebuild lock heart-beating is broken;
> >


Re: [DISCUS] Plan the next Hive release

2024-04-18 Thread Stamatis Zampetakis
The 4.0.0 release was quite recent so I assume we don't have major
breaking changes in there at the moment so we could cut the release
directly from master as soon as we want. HIVE-28166 is already merged
so we could aim to cut 4.1.0 as soon as HIVE-28190 goes in.

The experience shows that we are not very good at maintaining multiple
release branches so in general I would prefer to focus on releasing
only from master for the time being. Hive is a quite mature project so
in principle breaking changes should be rather rare which gives us a
bit of margin. I think a scheme where we backport less and release
more is preferable.

Best,
Stamatis

On Wed, Apr 17, 2024 at 9:56 AM Ayush Saxena  wrote:
>
> Hi Stamatis,
> The plan is to have a release line cut from the branch-4.0, So, we plan to 
> pull in some critical bug fixes & improvements into the 4.0.1 release and 
> have a quicker release.
> As of now we are just putting the label "hive-4.0.1-must" on the tickets and 
> we plan to make sure those get c-picked to the release line. AFAIK we haven't 
> started committing to any branch yet, was waiting if anyone feels 
> differently, so we can hold back if you have concerns or take a different 
> approach as well.
>
> From CI you mean to say the daily builds? else if you create a PR targeting 
> to branch-4.0, it will run the entire test suite I believe? In the meantime I 
> will update the instructions regarding the target branch & the label if 
> anyone wants that a particular ticket to be part of the 4.0.1 release.
>
> -Ayush
>
> On Wed, 17 Apr 2024 at 12:42, Stamatis Zampetakis  wrote:
>>
>> Thanks for starting the discussion Ayush.
>>
>> Having frequent releases is definitely needed so we should keep the
>> momentum going.
>>
>> I had the impression from other threads that the next Hive release
>> would be 4.1.0 and that it would be cut from master. I would like to
>> understand how 4.0.1 is different and if it is, what is the
>> contribution pattern that contributors and committers should follow?
>> If the idea is to maintain and commit in two (or more) branches the
>> steps should be documented and CI should be running on those branches.
>>
>> Best,
>> Stamatis
>>
>> On Wed, Apr 10, 2024 at 1:18 PM Denys Kuzmenko  wrote:
>> >
>> > We might need it sooner as identified some critical issues in the recent 
>> > code:
>> > 1. HIVE-28166: Truncate on Iceberg table disregards the branch name and 
>> > operates on a main;
>> > 2. HIVE-28190: Materialized view rebuild lock heart-beating is broken;


Archive old Hive releases

2024-04-17 Thread Stamatis Zampetakis
Hi all,

Following the INFRA policy [1] about handling current and older
releases, I just removed the following releases from the main download
site [2].

Apache Hive 1.2.2
Apache Hive 3.1.2
Apache Hive 1.2.2
Apache Hive 4.0.0-alpha-1
Apache Hive 4.0.0-alpha-2
Apache Hive 4.0.0-beta-1

The aforementioned releases can now be found in the archive [3]. When
a new release comes out we should keep in mind to perform the
necessary cleanup thus I added a new section in the wiki [4].

Best,
Stamatis

[1] 
https://infra.apache.org/release-download-pages.html#current-and-older-releases
[2] https://downloads.apache.org/hive/
[3] https://archive.apache.org/dist/hive/
[4] 
https://cwiki.apache.org/confluence/display/Hive/HowToRelease#HowToRelease-Archiveoldreleases


Re: [DISCUS] Plan the next Hive release

2024-04-17 Thread Stamatis Zampetakis
Thanks for starting the discussion Ayush.

Having frequent releases is definitely needed so we should keep the
momentum going.

I had the impression from other threads that the next Hive release
would be 4.1.0 and that it would be cut from master. I would like to
understand how 4.0.1 is different and if it is, what is the
contribution pattern that contributors and committers should follow?
If the idea is to maintain and commit in two (or more) branches the
steps should be documented and CI should be running on those branches.

Best,
Stamatis

On Wed, Apr 10, 2024 at 1:18 PM Denys Kuzmenko  wrote:
>
> We might need it sooner as identified some critical issues in the recent code:
> 1. HIVE-28166: Truncate on Iceberg table disregards the branch name and 
> operates on a main;
> 2. HIVE-28190: Materialized view rebuild lock heart-beating is broken;


Re: [Blog] Apache Hive 4.0 Release blog for ASF M & P

2024-04-05 Thread Stamatis Zampetakis
Great initiative and nice content. Overall, it looks great!

I have some minor comments. Is it possible to change permissions allow
comments from anyone or it has to be done on a per user basis?

Best,
Stamatis

On Fri, Apr 5, 2024 at 1:54 PM Ayush Saxena  wrote:
>
> Hi All,
>
> Have been talking to the ASF M & P team and they recongonise the 4.0 release 
> is a big milestone for our project.
>
> They are happy to have an entry for us in the their news column, ex:
> https://news.apache.org/foundation/entry/apache-software-foundation-announces-apache-wicket-v10
>
> So, I along with Denys, Simhadri & tons of help from ChatGpt have prepared a 
> draft to share with them.
> The draft is here:
>
> https://docs.google.com/document/d/10Zu8pHvWNDRTqn7yvYqvU4-kw3Q1TXo7mGo5m5fUP2Y/edit
>
> If you have some feedback or concerns, please share with us.
>
> If you want some improvements or removals, let us know here & we will do 
> that, or if you need write access to this page, just let me know.
>
> If nobody objects, I plan to send this to the team by next week Tuesday
>
>
> -Ayush


Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-02 Thread Stamatis Zampetakis
The new Apache Hive 4.0.0 release brings roughly 5K new commits (since
Apache Hive 3.1.3) and it's probably the biggest release so far in the
history of the project. The numbers clearly show that this is a
collective effort that wouldn't be possible without a strong community
and many volunteers along the years. Many thanks to everyone involved!

A special mention to Denys who went above and beyond his role of
release manager triaging release blockers, reviewing and fixing many
of those tickets that were blocking us for the past few months.

Best,
Stamatis

On Sun, Mar 31, 2024 at 2:54 PM Battula, Brahma Reddy
 wrote:
>
> Thank you for your hard work and dedication in releasing Apache Hive version 
> 4.0.0.
>
> Congratulations to the entire team on this achievement. Keep up the great 
> work!
>
> Does this consider as GA.?
>
> And Looks we need to update in the following location also.?
> https://hive.apache.org/general/downloads/
>
>
> From: Denys Kuzmenko 
> Date: Saturday, March 30, 2024 at 00:07
> To: u...@hive.apache.org , dev@hive.apache.org 
> 
> Subject: [ANNOUNCE] Apache Hive 4.0.0 Released
>
> The Apache Hive team is proud to announce the release of Apache Hive
>
> version 4.0.0.
>
>
>
> The Apache Hive (TM) data warehouse software facilitates querying and
>
> managing large datasets residing in distributed storage. Built on top
>
> of Apache Hadoop (TM), it provides, among others:
>
>
>
> * Tools to enable easy data extract/transform/load (ETL)
>
>
>
> * A mechanism to impose structure on a variety of data formats
>
>
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>
>   data storage systems such as Apache HBase (TM)
>
>
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache Spark 
> frameworks. (MapReduce is deprecated, and Spark has been removed so the text 
> needs to be modified depending on the release version)
>
>
>
> For Hive release details and downloads, please visit:
>
> https://hive.apache.org/downloads.html
>
>
>
> Hive 4.0.0 Release Notes are available here:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343343=Text=12310843
>
>
>
> We would like to thank the many contributors who made this release
>
> possible.
>
>
>
> Regards,
>
>
>
> The Apache Hive Team


Re: [VOTE] Release Apache Hive 4.0.0 (Release Candidate 0)

2024-03-28 Thread Stamatis Zampetakis
-0 (binding)

Ubuntu 20.04.6 LTS, java version "1.8.0_261", Apache Maven 3.6.3

* Verified signatures and checksums OK
* Checked diff between git repo and release sources (diff -qr hive-git
hive-src) KO (4 empty directories in apache-hive-4.0.0-src)
Only in apache-hive-4.0.0-src/hcatalog: streaming
Only in apache-hive-4.0.0-src/itests: qtest-spark
Only in apache-hive-4.0.0-src/packaging/src/docker: ${project.basedir}
Only in apache-hive-4.0.0-src/shims: scheduler
* Checked LICENSE, source headers, and copyright notices KO (Logged
HIVE-28155, HIVE-28156, HIVE-28157, HIVE-28158, HIVE-28159,
HIVE-28160, for minor issues and improvements)
* Checked NOTICE, and README.md file OK
* Built from release sources and create binaries (mvn clean install
-DskipTests -Pitests,iceberg,dist) OK
* Spot check maven artifacts for general structure, LICENSE, NOTICE,
META-INF content KO (HIVE-28161)

Lots of small issues that would be nice to fix before releasing. I
didn't perform more release checks/tests cause I exceeded my quota
trying to address the logged issues.

Best,
Stamatis

On Thu, Mar 28, 2024 at 3:15 PM Krisztian Kasa
 wrote:
>
> +1 (binding)
>
> * Verified the checksum and signature [OK]
>
> * Built Hive 4.0.0 from source [OK]
>
> * Started Hiveserver2 with Hadoop 3.3.6 and Tez 0.10.3 and Postgres [OK]
>
> * Ran some simple Hive statements: create acid/iceberg tables, create
> materialized views having join of two tables and aggregates in the
> definition [OK]
>
> Thanks Denys for driving the release!
>
> Regards,
> Krisztian
>
>
> On Thu, Mar 28, 2024 at 12:04 PM Kirti Ruge  wrote:
>
> > +1 (non-binding)
> >
> >
> > I have done below steps on Mac m1
> >
> >  Built from HIVE 4.0.0 from source successfully.
> >  Verified checksums and signatures.
> >  Initialized metastore with postgresql.
> >  Started metastore and hiveserver .
> >  Ran some simple Hive queries via beeline and  checked same on webui (
> > http://localhost:10002/).
> >  Built docker image and started hive services with docker.
> >
> > Regards,
> > Kirti
> >
> > > On 28-Mar-2024, at 3:41 PM, Zoltán Rátkai 
> > wrote:
> > >
> > > +1 (non-binding)
> > >
> > > Performed on Mac M1:
> > >
> > > - Verified checksums
> > > - Verified signature
> > > - Built from source
> > > - Build docker image (HADOOP_VERSION=3.3.6, TEZ_VERSION=0.10.3)
> > > - Started docker image
> > > - Checked web GUI is working (http://localhost:10002/)
> > > - Created a table and ran CRUD operations on Hive ACID table successfully
> > > on the Docker environment
> > > - Checked the executed queries via web GUI Regards,
> > >
> > > Zoltan Ratkai
> > >
> > > On Thu, Mar 28, 2024 at 8:41 AM kokila narayanan <
> > > kokilanarayana...@gmail.com> wrote:
> > >
> > >> +1 (non-binding)
> > >>
> > >> 1. Verified checksums
> > >> 2. Verified signatures
> > >> 3. Built from source successfully
> > >> 4. Deployed and started binary tar with Hadoop 3.3.6 and Tez 0.10.3.
> > >> 5. Executed basic operations on ACID and external tables.
> > >>
> > >> Regards,
> > >> Kokila
> > >>
> > >> On Thu, Mar 28, 2024 at 12:30 PM Sourabh Badhya
> > >>  wrote:
> > >>
> > >>> +1 (non-binding)
> > >>>
> > >>> [1] Built from source successfully.
> > >>> [2] Verified checksums and signatures.
> > >>> [3] Built docker image with Apache Hadoop 3.3.6 and Apache Tez 0.10.3
> > and
> > >>> metastore using Postgres successfully.
> > >>> [4] Ran CRUD operations on Hive ACID, Iceberg tables and basic
> > operations
> > >>> on Hive external tables successfully on the Docker environment.
> > >>> [5] Browsed the same executed queries via Hiveserver2 UI.
> > >>>
> > >>> Thanks Denys for driving the release.
> > >>>
> > >>> Regards,
> > >>> Sourabh Badhya
> > >>>
> > >>> On Wed, Mar 27, 2024 at 10:38 PM Ayush Saxena 
> > >> wrote:
> > >>>
> >  +1 (Binding)
> > 
> >  * Built from source
> >  * Verified checksums
> >  * Verified signature
> >  * Verified all code files have ASF Header
> >  * Validated the Notice & License files
> >  * No code diff b/w git tag & src tar
> >  * Ran some basic operations on Iceberg, ACID & External Tables (Hive
> > on
> >  Tez)
> >  * Browsed through HS2 UI
> >  * Built Docker image from source & tried some basic commands on the
> > >>> docker
> >  environment.
> >  * Skimmed over the contents of maven repo.
> > 
> >  Thanx Denys for driving the release. Good Luck!!!
> > 
> >  -Ayush
> > 
> >  On Wed, 27 Mar 2024 at 21:05, Marta Kuczora
> > >>>  > >
> >  wrote:
> > 
> > > +1 (binding)
> > >
> > > Thanks a lot Denys for driving the release!
> > >
> > > * Verified the checksum and signature [OK]
> > >
> > > * Built Hive 4.0.0 from source [OK]
> > >
> > > * Initialized metastore with MySQL [OK]
> > >
> > > * Built package and ran metastore and hiveserver [OK]
> > >
> > > * Deployed and start the binary tar with Hadoop 3.3.6 and Tez 0.10.3
> > >>> [OK]
> > 

Re: Retire https://apache.github.io sites

2024-03-22 Thread Stamatis Zampetakis
The work on HIVE-27953 is now completed so the github.io sites are
officially down and the obsolete content is removed from the various
repos.

Many thanks Simhadri for leading all these efforts in modernising the
Hive website and dealing with the legacy sites.

Best,
Stamatis

On Wed, Mar 13, 2024 at 4:22 PM Simhadri G  wrote:
>
> Hi Everyone,
>
> The revamped hive website has been hosted at https://hive.apache.org/  for
> more than a year now.
>
> As a result , we would like to retire and disable old Apache Hive website
> hosted via github pages in the following sites:
>
>- https://apache.github.io/hive/
>- https://apache.github.io/hive-site/
>
> The work for the same is tracked in
> https://issues.apache.org/jira/browse/HIVE-27953 .
>
> Kindly let us know if there are any questions regarding this.
>
> Thanks!
> Simhadri G


Re: "\n" in HiveConf

2024-02-20 Thread Stamatis Zampetakis
Hello,

Indeed they have an impact on the readability and maintenance of the
class so I would be in favor of dropping them.
Checking the calls to HiveConf.ConfVars#getDescription [1] it seems
that line breaks are somewhat relevant for generating the
hive-default.xml.template and SHOW CONF command but definitely not a
reason for keeping them. In fact, I get the impression that by
removing the line breaks we could also remove the respective
normalization code.

Best,
Stamatis

https://github.com/apache/hive/blob/5b76949da6fe65364a4e3766680871167131157f/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L5899

On Sun, Feb 11, 2024 at 9:10 AM László Bodor  wrote:
>
> Hey All!
>
> Maybe I'm missing some history here: does anyone know why and for what we
> use line break characters in the descriptions in HiveConf
> ?
> (1046 occurrences as we speak)
> As far as I can tell according to its current state, we *don't* generate
> this page from the class:
> https://cwiki.apache.org/confluence/display/hive/configuration+properties
>
> Motivation: the pain when I rephrase a config description and not only I
> have to break long lines (which is fine for code style) but I have to
> change (add/remove) line breaks too.
>
> Regards,
> Laszlo Bodor


Re: Enhance PerfLogger with annotations using AOP

2024-02-05 Thread Stamatis Zampetakis
Hey Soumyakanti,

Thanks for starting this discussion.

I like the idea of reducing boilerplate code and one way although not
the only one is using AOP. AOP libraries rely on code injection and
there are various pros/cons [1] when using such tools. If AspectJ is
the best option for this use-case I have no objections using it.

The PerfLogger component is called quite often during the lifecycle of
a query so when introducing changes we should be mindful of the
performance overhead that they may introduce. Before merging the
change we should at least have some basic benchmark to ensure that the
instrumentation performed by AspectJ (or another code injection tool)
is not too expensive.

If there are no significant perf differences and the code becomes more
readable after this change then it definitely makes sense to move this
forward.

Best,
Stamatis

[1] https://dzone.com/articles/practical-introduction-code

On Mon, Feb 5, 2024 at 1:19 AM Soumyakanti Das
 wrote:
>
> Hi all,
>
> Do you guys think it's a good idea to implement annotations for PerfLogger?
> Currently, we have to surround the code with a PerfLogBegin and a
> PerfLogEnd to log execution time. There are many methods where the first
> and last line are these. Instead, we could use AOP to create an annotation
> and annotate a method.
>
> I have done a small POC with AspectJ:
> https://github.com/soumyakanti3578/hive/tree/annotated-perf-logger
> 
> I had to just add 2 dependencies to hive-exec, create an annotation
> (@LogPerf), and an Aspect (PerfLoggerAspect). I tested it on my local
> machine and it works well.
>
> PerfLogBegin takes 2 arguments - callerName, and method. And PerfLogEnd
> takes an additional argument - additionalInfo. These three arguments can be
> passed through the annotation.
>
> I think this will help clean up our source code a little bit. Please let me
> know what you think about this.
>
> Thanks,
> Soumyakanti Das


Re: Subscribe to security ML (Hive Committers)

2024-02-02 Thread Stamatis Zampetakis
Hey Sankar,

I don't think we can add you explicitly. You have to submit the
subscription request from your @apache domain following the
traditional procedure.

Best,
Stamatis

On Fri, Feb 2, 2024 at 9:31 AM Sankar Hariappan  wrote:
>
> Hi Stamatis/Ayush,
> Could you please add me to the security mailing list of Hive?
> sank...@apache.org
>
> Thanks,
> Sankar
>
> On 2024/01/22 13:53:53 Stamatis Zampetakis wrote:
> > For traceability purposes, please subscribe to the mailing list with
> > your @apache.org address. Any other requests will be denied from now
> > onwards. The moderators (us) have no way to tell if the request comes
> > from a valid Hive community member or a malicious attacker.
> >
> > On Mon, Jan 22, 2024 at 11:16 AM Stamatis Zampetakis  
> > wrote:
> > >
> > > The security issues are of utmost importance to ASF projects and
> > > should be treated in a timely manner.
> > >
> > > Thanks Ayush for the reminder!
> > >
> > > Best,
> > > Stamatis
> > >
> > > On Mon, Jan 22, 2024 at 11:05 AM Ayush Saxena  wrote:
> > > >
> > > > Hi Folks,
> > > > In case any of the committers or PMC members are not subscribed to the
> > > > security mailing list, please subscribe & help address or share 
> > > > pointers if
> > > > you can.
> > > >
> > > > if you are a committer, please send a mail to:
> > > > security-subscr...@hive.apache.org
> > > >
> > > > This mail list is moderated, so the request needs to be approved, so 
> > > > please
> > > > send a request from an email id which the moderator (like me) can 
> > > > identify.
> > > >
> > > > If you have any issues with the process, do reach out to me or any other
> > > > moderator.
> > > >
> > > > -Ayush
> >


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Stamatis Zampetakis
Hey team,

Initially we had zero people willing to help and advance this topic.
Now we have more than three and I am sure everyone has good
intentions.

Let's not forget that in Apache the community is more important than
code. If we have a healthy community we can fix any kind of problem in
the code.

I like contributing to opensource projects to interact with more
people, learn from them, and of course make new friends. Indeed I
consider Zoltan a friend and the same goes for Ayush although I never
met any of them in person. Let's not fight for this anymore and come
back to the discussion in 2-3 days when everyone will be in a better
mood.

Best,
Stamatis

On Tue, Jan 23, 2024 at 10:12 AM Ayush Saxena  wrote:
>
> Ok I will get the repo deleted. I am not taking any sarcastic comments from
> Zoltan at this stage. Believe me I am not getting anything for having my
> name there.
>
> Why I did this?
>
> Someone was so obsessed with getting his name checked into the "Apache
> Code" that he developed something on his fork & checked in that code to the
> Apache Hive code, so, professional.
>
> Many Hive Commiters have rights is a wrong phrase to quote: Many Hive
> Committer who are your friends have rights. To push an image we need to
> catch Zoltan, but ok do whatever you want.
>
> I just want to say, Zoltan, you might be a very good developer, but please
> change your "whatever you want to do" tone,
>
> Not following this further
>
> -Ayush
>
> On Tue, 23 Jan 2024 at 14:30, Zoltan Haindrich  wrote:
>
> >
> >  > I just copied the repo: cp -R and Put Zoltan's name & reference to his
> >  > repo. I didn't knew any better way than that, you can definitely force
> > push
> >  > with another fancy approach
> >
> > lol...what a sophisticated approach - I wonder if you don't know the
> > `fancy approach` then why you've done it?
> >
> > I wonder what you've copied - because you missed the addition of the
> > github action which builds the image for every PR
> >
> > Now you are the sole contributor of all existing stuff (congrats)...but do
> > whatever you want...
> > It was always there and available to use - many hive commiters had push
> > and approve rights on those repos.
> >
> > I think you might also want to do the same with
> > https://github.com/kgyrtkirk/hive-toolbox
> > because your contribution references it here:
> > https://github.com/apache/hive-dev-box/blob/663625bc74e799f35c6bab1c1485530367287c61/tools/install_toolbox#L21C1-L21C115
> > and probably also cp -R
> > https://github.com/kgyrtkirk/hive-test-kube/
> >
> > cheers,
> > Zoltan
> >
> >
> > On 1/23/24 09:29, Ayush Saxena wrote:
> > > I just copied the repo: cp -R and Put Zoltan's name & reference to his
> > > repo. I didn't knew any better way than that, you can definitely force
> > push
> > > with another fancy approach, just c-pick the other commits for NOTICE &
> > all
> > > on top of it. The old code & commits had some cloudera references, which
> > I
> > > personally wanted to avoid, but yep we can take another approach as well.
> > > Good with me.
> > >
> > > For the Jira, yep we should, we aren't going to release this, so for fix
> > > version, maybe I will create a dev-box-1.0.0 which we can use to resolve
> > > the tickets, shouldn't put main repo versions, else that will pop up in
> > our
> > > release notes, or let me know if you want a separate Jira project under
> > > Hive for these repos as well, We can explore that route if folks feel
> > that
> > > way.
> > >
> > > -Ayush
> > >
> > >
> > > On Tue, 23 Jan 2024 at 13:35, Stamatis Zampetakis 
> > wrote:
> > >
> > >> Thanks for helping advance this Ayush!
> > >>
> > >> I saw that the commit history was not retained. Is there any reason
> > >> for dropping it? Keeping the history and the people who contributed
> > >> thus far would be nice to have.
> > >>
> > >> For the contribution model to this repository, I would recommend the
> > >> usual process. Raise a JIRA ticket, file a PR, wait for review, and
> > >> then merge.
> > >>
> > >> Best,
> > >> Stamatis
> > >>
> > >> On Tue, Jan 23, 2024 at 8:45 AM Ayush Saxena 
> > wrote:
> > >>>
> > >>> This is the new repo:
> > >>> https://github.com/apache/hive-dev-box
> > >>>
> > >>> It has the in

Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Stamatis Zampetakis
I was thinking of pushing the current state of Zoltan's branch to the
new apache remote. I did this just now in
https://github.com/apache/hive-dev-box/tree/main branch.
I can cherry-pick the remaining commits from master and drop that
branch unless we have other things to consider as well. Please share
your thoughts.

For the moment a placeholder JIRA version will do. For the rest we can
decide as we go.

On Tue, Jan 23, 2024 at 10:01 AM Zoltan Haindrich  wrote:
>
>
>  > I just copied the repo: cp -R and Put Zoltan's name & reference to his
>  > repo. I didn't knew any better way than that, you can definitely force push
>  > with another fancy approach
>
> lol...what a sophisticated approach - I wonder if you don't know the `fancy 
> approach` then why you've done it?
>
> I wonder what you've copied - because you missed the addition of the github 
> action which builds the image for every PR
>
> Now you are the sole contributor of all existing stuff (congrats)...but do 
> whatever you want...
> It was always there and available to use - many hive commiters had push and 
> approve rights on those repos.
>
> I think you might also want to do the same with 
> https://github.com/kgyrtkirk/hive-toolbox
> because your contribution references it here: 
> https://github.com/apache/hive-dev-box/blob/663625bc74e799f35c6bab1c1485530367287c61/tools/install_toolbox#L21C1-L21C115
> and probably also cp -R
> https://github.com/kgyrtkirk/hive-test-kube/
>
> cheers,
> Zoltan
>
>
> On 1/23/24 09:29, Ayush Saxena wrote:
> > I just copied the repo: cp -R and Put Zoltan's name & reference to his
> > repo. I didn't knew any better way than that, you can definitely force push
> > with another fancy approach, just c-pick the other commits for NOTICE & all
> > on top of it. The old code & commits had some cloudera references, which I
> > personally wanted to avoid, but yep we can take another approach as well.
> > Good with me.
> >
> > For the Jira, yep we should, we aren't going to release this, so for fix
> > version, maybe I will create a dev-box-1.0.0 which we can use to resolve
> > the tickets, shouldn't put main repo versions, else that will pop up in our
> > release notes, or let me know if you want a separate Jira project under
> > Hive for these repos as well, We can explore that route if folks feel that
> > way.
> >
> > -Ayush
> >
> >
> > On Tue, 23 Jan 2024 at 13:35, Stamatis Zampetakis  wrote:
> >
> >> Thanks for helping advance this Ayush!
> >>
> >> I saw that the commit history was not retained. Is there any reason
> >> for dropping it? Keeping the history and the people who contributed
> >> thus far would be nice to have.
> >>
> >> For the contribution model to this repository, I would recommend the
> >> usual process. Raise a JIRA ticket, file a PR, wait for review, and
> >> then merge.
> >>
> >> Best,
> >> Stamatis
> >>
> >> On Tue, Jan 23, 2024 at 8:45 AM Ayush Saxena  wrote:
> >>>
> >>> This is the new repo:
> >>> https://github.com/apache/hive-dev-box
> >>>
> >>> It has the initial code from Zoltan, LICENSE, NOTICE & Disclaimer-WIP
> >>> file + I added Apache Header to all the files wherever possible. We need
> >> a
> >>> docker space to push these built images, have requested INFRA for the
> >> same.
> >>>
> >>> The repo is in WIP stage, If you find something problematic, please push
> >> a
> >>> fix to the repo or let me know.
> >>>
> >>> Some observations:
> >>> * The build command works on x86 box only not on aarch64,
> >>> * The github action to push the images doesn't work, that needs to be
> >> fixed
> >>>
> >>> -Ayush
> >>>
> >>> On Mon, 22 Jan 2024 at 21:55, Ayush Saxena  wrote:
> >>>
> >>>> I think we are now not using Zoltan's repo. We are using a fork from a
> >>>> contributor in the hive code.[1], I will go ahead and create a repo
> >> under
> >>>> Apache Hive for hive-dev-box tomorrow & put the LICENSE, NOTICE &
> >>>> DISCLAIMER-WIP files in it, Then will take things from there, atleast
> >> it
> >>>> would be a starting point and all of us can take care of things from
> >> there
> >>>> slowly-slowly :-)
> >>>>
> >>>> Shout out, if anyone has objections around it.
> >>>>
> >>>> -Ayush
> >>>&g

Re: [Discuss] Enable Attachments for Hive mailing lists

2024-01-23 Thread Stamatis Zampetakis
+0

I rarely open attachments from public mailing lists for security
reasons (unless we are talking for known safe extensions).

Moreover, I find it easier to glance through code if people share a
link to a PR or code in GitHub than if I have to download and apply a
patch locally.

I understand that for some people this may be helpful so I am not
opposing the change.

Best,
Stamatis

On Mon, Jan 22, 2024 at 2:39 PM Attila Turoczy
 wrote:
>
> +1 for me as well. We need it.
>
> -Attila
>
> On Mon, Jan 22, 2024 at 1:25 PM Ayush Saxena  wrote:
>
> > Hi All,
> > As of now we don't allow having attachments on the hive mailing lists
> > (apart from security ML), This prevents us from attaching patches/design
> > doc or even screenshots of issues being reported on our mailing lists.
> >
> > A lot of projects allow that, I feel we should enable this for our Hive
> > mailing lists as well for better dev experience.
> >
> > Let me know your thoughts!!!
> >
> > Obviously a +1 from me
> >
> > -Ayush
> >


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Stamatis Zampetakis
Thanks for helping advance this Ayush!

I saw that the commit history was not retained. Is there any reason
for dropping it? Keeping the history and the people who contributed
thus far would be nice to have.

For the contribution model to this repository, I would recommend the
usual process. Raise a JIRA ticket, file a PR, wait for review, and
then merge.

Best,
Stamatis

On Tue, Jan 23, 2024 at 8:45 AM Ayush Saxena  wrote:
>
> This is the new repo:
> https://github.com/apache/hive-dev-box
>
> It has the initial code from Zoltan, LICENSE, NOTICE & Disclaimer-WIP
> file + I added Apache Header to all the files wherever possible. We need a
> docker space to push these built images, have requested INFRA for the same.
>
> The repo is in WIP stage, If you find something problematic, please push a
> fix to the repo or let me know.
>
> Some observations:
> * The build command works on x86 box only not on aarch64,
> * The github action to push the images doesn't work, that needs to be fixed
>
> -Ayush
>
> On Mon, 22 Jan 2024 at 21:55, Ayush Saxena  wrote:
>
> > I think we are now not using Zoltan's repo. We are using a fork from a
> > contributor in the hive code.[1], I will go ahead and create a repo under
> > Apache Hive for hive-dev-box tomorrow & put the LICENSE, NOTICE &
> > DISCLAIMER-WIP files in it, Then will take things from there, atleast it
> > would be a starting point and all of us can take care of things from there
> > slowly-slowly :-)
> >
> > Shout out, if anyone has objections around it.
> >
> > -Ayush
> >
> >
> > [1]
> > https://github.com/apache/hive/blob/1aeaff2057a2f4c241f8bcc53a2a529e6e7f45d4/Jenkinsfile#L124C44-L124C65
> >
> > On Wed, 6 Sept 2023 at 20:11, Stamatis Zampetakis 
> > wrote:
> >
> >> Based on the discussion under LEGAL-653, it seems that the only
> >> requirement to migrate the repos under the Apache namespace is to
> >> apply the AL2 license in the majority of the files in there.
> >>
> >> I am looking for volunteers so that we can review the existing code in
> >> those repo and apply the AL2 license where possible. Depending on how
> >> many people step up we can divide the work accordingly.
> >>
> >> It would be interesting to see if we can use RAT [1] to
> >> automate/assist  in this process.
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://creadur.apache.org/rat/apache-rat-plugin/usage.html
> >>
> >> On Thu, Aug 24, 2023 at 11:05 AM Stamatis Zampetakis 
> >> wrote:
> >> >
> >> > For the licensing question, I just created LEGAL-653 [1].
> >> >
> >> > [1] https://issues.apache.org/jira/browse/LEGAL-653
> >> >
> >> > On Thu, Aug 24, 2023 at 11:55 AM Stamatis Zampetakis 
> >> wrote:
> >> > >
> >> > > Creating the new repos should be kind of trivial. I think I will be
> >> > > able to do it using https://selfserve.apache.org/.
> >> > >
> >> > > Since this will bring quite a bit of code under the ASF I will wait a
> >> > > couple of days till I create the new repos.
> >> > >
> >> > > Once this is done, I think we can simply push the content from the old
> >> > > repos to the new ones. To avoid any kind of IP problems it would be
> >> > > best if Zoltan does this step.
> >> > >
> >> > > One thing that we may need to be careful about is the licensing of
> >> > > these repos. We are not going to make source releases from there but
> >> > > since the code will be under the ASF namespace people will assume that
> >> > > it is ASF licensed so they may start copy-pasting stuff from there.
> >> > >
> >> > > Is there anything preventing us from putting the code under the AL2
> >> license?
> >> > >
> >> > > Best,
> >> > > Stamatis
> >> > >
> >> > > On Wed, Aug 23, 2023 at 6:14 PM Attila Turoczy
> >> > >  wrote:
> >> > > >
> >> > > > Thank you, Stamatis! Also, Zoltan for the "donation" :)
> >> > > >
> >> > > > -Attila
> >> > > >
> >> > > > On Wed, Aug 23, 2023 at 4:53 PM Ayush Saxena 
> >> wrote:
> >> > > >
> >> > > > > +1,
> >> > > > > Thanx Stamatis foe initiating this. This was something which was
> >> in my
> >>

Re: Subscribe to security ML (Hive Committers)

2024-01-22 Thread Stamatis Zampetakis
For traceability purposes, please subscribe to the mailing list with
your @apache.org address. Any other requests will be denied from now
onwards. The moderators (us) have no way to tell if the request comes
from a valid Hive community member or a malicious attacker.

On Mon, Jan 22, 2024 at 11:16 AM Stamatis Zampetakis  wrote:
>
> The security issues are of utmost importance to ASF projects and
> should be treated in a timely manner.
>
> Thanks Ayush for the reminder!
>
> Best,
> Stamatis
>
> On Mon, Jan 22, 2024 at 11:05 AM Ayush Saxena  wrote:
> >
> > Hi Folks,
> > In case any of the committers or PMC members are not subscribed to the
> > security mailing list, please subscribe & help address or share pointers if
> > you can.
> >
> > if you are a committer, please send a mail to:
> > security-subscr...@hive.apache.org
> >
> > This mail list is moderated, so the request needs to be approved, so please
> > send a request from an email id which the moderator (like me) can identify.
> >
> > If you have any issues with the process, do reach out to me or any other
> > moderator.
> >
> > -Ayush


Re: Re: Cleanup remote feature/wip branches

2024-01-22 Thread Stamatis Zampetakis
Cleanup is complete. Thanks to Krisztian and Zhihua for helping out!

I only kept https://github.com/apache/hive/tree/release-1.1 cause I
was unsure of its purpose.

Best,
Stamatis

On Fri, Jan 19, 2024 at 6:46 PM Chao Sun  wrote:
>
> +1
>
> On Fri, Jan 19, 2024 at 4:33 AM Attila Turoczy
>  wrote:
> >
> > +1
> >
> > On Fri, 19 Jan 2024 at 04:30, dengzhhu653  wrote:
> >
> > > +1
> > > At 2024-01-19 19:58:49, "Krisztian Kasa" 
> > > wrote:
> > > >+1
> > > >
> > > >On Fri, Jan 19, 2024 at 11:28 AM Alessandro Solimando <
> > > >alessandro.solima...@gmail.com> wrote:
> > > >
> > > >> +1, thanks Stamatis
> > > >>
> > > >> On Fri, Jan 19, 2024, 11:14 Ayush Saxena  wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > -Ayush
> > > >> >
> > > >> > > On 19-Jan-2024, at 3:41 PM, Stamatis Zampetakis 
> > > >> > wrote:
> > > >> > >
> > > >> > > Hey everyone,
> > > >> > >
> > > >> > > I noticed that in our official git repo [1] we have some kind of
> > > >> > > feature/WIP branches (see list below). Most of them (if not all) 
> > > >> > > are
> > > >> > > stale, add noise, and some of them eat CI resources (storage and
> > > CPU)
> > > >> > > since Jenkins picks them up for builds/precommits.
> > > >> > >
> > > >> > > I would like to drop those at the end of this email. Please +1 if
> > > you
> > > >> > agree.
> > > >> > >
> > > >> > > Best,
> > > >> > > Stamatis
> > > >> > >
> > > >> > > [1] https://github.com/apache/hive/branches/all
> > > >> > >
> > > >> > > git branch -r | grep origin | grep -v "branch-" | grep -v "master"
> > > >> > >  origin/HIVE-23274_280_rb
> > > >> > >  origin/HIVE-23337_280_rb
> > > >> > >  origin/HIVE-23403_280_rb
> > > >> > >  origin/HIVE-23440_280_rb
> > > >> > >  origin/HIVE-23470_rb
> > > >> > >  origin/HIVE-4115
> > > >> > >  origin/branc-2.3
> > > >> > >  origin/cbo
> > > >> > >  origin/dependabot/maven/com.google.protobuf-protobuf-java-3.21.7
> > > >> > >
> > > >> >
> > > >>
> > > origin/dependabot/maven/itests/qtest-druid/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> > > >> > >  origin/dependabot/maven/org.apache.commons-commons-text-1.10.0
> > > >> > >
> > > >> origin/dependabot/maven/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> > > >> > >  origin/dependabot/maven/org.postgresql-postgresql-42.4.3
> > > >> > >
> > > >> >
> > > >>
> > > origin/dependabot/maven/standalone-metastore/com.google.protobuf-protobuf-java-3.21.7
> > > >> > >
> > > >> >
> > > >>
> > > origin/dependabot/maven/standalone-metastore/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> > > >> > >
> > > >> >
> > > >>
> > > origin/dependabot/maven/standalone-metastore/org.postgresql-postgresql-42.4.3
> > > >> > >  origin/ptf-windowing
> > > >> > >  origin/release-1.1
> > > >> > >  origin/revert-1365-upgrade-guava
> > > >> > >  origin/revert-1855-HIVE-24624
> > > >> > >  origin/revert-2694-HIVE-25355
> > > >> > >  origin/revert-3624-HIVE-26567
> > > >> > >  origin/revert-4247-hive-23256
> > > >> > >  origin/revert-4306-HIVE-27330
> > > >> > >  origin/revert-4452-HIVE-57988-BetweenBugFix
> > > >> > >  origin/revert-4501-OptimizeGetPartitionAPI
> > > >> > >  origin/vectorization
> > > >> >
> > > >>
> > >


Re: Subscribe to security ML (Hive Committers)

2024-01-22 Thread Stamatis Zampetakis
The security issues are of utmost importance to ASF projects and
should be treated in a timely manner.

Thanks Ayush for the reminder!

Best,
Stamatis

On Mon, Jan 22, 2024 at 11:05 AM Ayush Saxena  wrote:
>
> Hi Folks,
> In case any of the committers or PMC members are not subscribed to the
> security mailing list, please subscribe & help address or share pointers if
> you can.
>
> if you are a committer, please send a mail to:
> security-subscr...@hive.apache.org
>
> This mail list is moderated, so the request needs to be approved, so please
> send a request from an email id which the moderator (like me) can identify.
>
> If you have any issues with the process, do reach out to me or any other
> moderator.
>
> -Ayush


Cleanup remote feature/wip branches

2024-01-19 Thread Stamatis Zampetakis
Hey everyone,

I noticed that in our official git repo [1] we have some kind of
feature/WIP branches (see list below). Most of them (if not all) are
stale, add noise, and some of them eat CI resources (storage and CPU)
since Jenkins picks them up for builds/precommits.

I would like to drop those at the end of this email. Please +1 if you agree.

Best,
Stamatis

[1] https://github.com/apache/hive/branches/all

git branch -r | grep origin | grep -v "branch-" | grep -v "master"
  origin/HIVE-23274_280_rb
  origin/HIVE-23337_280_rb
  origin/HIVE-23403_280_rb
  origin/HIVE-23440_280_rb
  origin/HIVE-23470_rb
  origin/HIVE-4115
  origin/branc-2.3
  origin/cbo
  origin/dependabot/maven/com.google.protobuf-protobuf-java-3.21.7
  
origin/dependabot/maven/itests/qtest-druid/org.eclipse.jetty-jetty-server-9.4.51.v20230217
  origin/dependabot/maven/org.apache.commons-commons-text-1.10.0
  origin/dependabot/maven/org.eclipse.jetty-jetty-server-9.4.51.v20230217
  origin/dependabot/maven/org.postgresql-postgresql-42.4.3
  
origin/dependabot/maven/standalone-metastore/com.google.protobuf-protobuf-java-3.21.7
  
origin/dependabot/maven/standalone-metastore/org.eclipse.jetty-jetty-server-9.4.51.v20230217
  origin/dependabot/maven/standalone-metastore/org.postgresql-postgresql-42.4.3
  origin/ptf-windowing
  origin/release-1.1
  origin/revert-1365-upgrade-guava
  origin/revert-1855-HIVE-24624
  origin/revert-2694-HIVE-25355
  origin/revert-3624-HIVE-26567
  origin/revert-4247-hive-23256
  origin/revert-4306-HIVE-27330
  origin/revert-4452-HIVE-57988-BetweenBugFix
  origin/revert-4501-OptimizeGetPartitionAPI
  origin/vectorization


CI is down: No space left on device

2024-01-19 Thread Stamatis Zampetakis
FYI: https://issues.apache.org/jira/browse/HIVE-28013

Best,
Stamatis


Re: [VOTE] Mark Hive 1.x EOL

2024-01-17 Thread Stamatis Zampetakis
+1 (binding)

Best,
Stamatis

On Wed, Jan 17, 2024 at 8:21 AM Attila Turoczy
 wrote:
>
> +1
>
> -Attila
>
> On Tue, 16 Jan 2024 at 22:18, Butao Zhang  wrote:
>
> > +1
> >
> >
> >
> > Thanks,
> > Butao Zhang
> >  Replied Message 
> > | From | Ayush Saxena |
> > | Date | 1/17/2024 14:15 |
> > | To | dev |
> > | Subject | [VOTE] Mark Hive 1.x EOL |
> > Hi All,
> > Following the discussion in [1], Starting an official thread to mark Hive
> > 1.x EOL.
> >
> > Marking a release line EOL, means there won't be any further releases for
> > that release line.
> >
> > I will start with my +1
> >
> > -Ayush
> >
> > [1] https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s
> >


Re: SonarCloud deprecates Java11?

2024-01-16 Thread Stamatis Zampetakis
Hello,

There is a PR for addressing the problem [1] and it will be merged in
the next few days. In the meantime feel free to ignore the Sonar
errors; nothing too bad will happen by doing this.

Many thanks to Wechar Yu for jumping on this and pushing it forward.

Best,
Stamatis

[1] https://github.com/apache/hive/pull/5004

On Tue, Jan 16, 2024 at 10:01 PM Zsolt Miskolczi
 wrote:
>
> Hi,
>
> I saw that in one of the splits at my pr. Should we start to worry?
>
> jenkins / hive-precommit / PR-4740 / #22 (apache.org)
> 
>
>
> [2024-01-16T19:33:27.462Z] [INFO] BUILD FAILURE
>
>  
> [2024-01-16T19:33:27.462Z]
> [INFO] 
> 
>
>  
> [2024-01-16T19:33:27.462Z]
> [INFO] Total time:  55.968 s
>
>  
> [2024-01-16T19:33:27.462Z]
> [INFO] Finished at: 2024-01-16T19:33:26Z
>
>  
> [2024-01-16T19:33:27.462Z]
> [INFO] 
> 
>
>  
> [2024-01-16T19:33:27.462Z]
> [ERROR] Failed to execute goal
> org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.1.2184:sonar
> (default-cli) on project hive:
>
>  
> [2024-01-16T19:33:27.462Z]
> [ERROR]
>
>  
> [2024-01-16T19:33:27.462Z]
> [ERROR] The version of Java (11.0.8) used to run this analysis is
> deprecated, and SonarCloud no longer supports it. Please upgrade to
> Java 17 or later.
>
>  
> [2024-01-16T19:33:27.462Z]
> [ERROR] As a temporary measure, you can set the property
> 'sonar.scanner.force-deprecated-java-version' to 'true' to continue
> using Java 11.0.8
>
>  
> [2024-01-16T19:33:27.462Z]
> [ERROR] This workaround will only be effective until January 28, 2024.
> After this date, all scans using the deprecated Java 11 will fail.
>
>  
> [2024-01-16T19:33:27.462Z]
> [ERROR] -> [Help 1]


Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2024-01-10 Thread Stamatis Zampetakis
+1 for marking Hive 1.x EOL.

I will also take the opportunity to ask again for Hive 2.x. Are there
people willing to take on the maintenance of the branch-2 line? The
people who take on the maintenance of the release line should (as a
bare minimum) ensure that all CVEs (existing and new ones) are
backported in a timely manner and cut new releases once this happens.
If there are no volunteers to take on this task then the PMC should
also vote for closing this branch.

Best,
Stamatis

On Wed, Jan 10, 2024 at 12:48 AM Ayush Saxena  wrote:
>
> I will start a vote to mark Hive 1.x EOL next week. Let me know if
> anyone has concerns around it.
>
> The main reason to mark a release line EOL is: if we have a CVE & if
> we don't release all the active lines with the fix we can't announce
> that & the PMC would be flagged every quarter for delaying the
> process, So, sooner or later we need to find a way to reduce the
> number of active release lines.
>
> -Ayush
>
> On Fri, 29 Jul 2022 at 01:35, Chao Sun  wrote:
> >
> > Hive 2.x is still being used by other projects like Spark and Iceberg,
> > and periodically there are bug fixes & CVE fixes coming into the
> > branch. So I would suggest keeping it alive for a bit longer (maybe
> > after 2.3.10/11 release) until the other projects are ready to move
> > away from it (which could take some significant efforts).
> >
> > Chao
> >
> > On Thu, Jul 28, 2022 at 5:51 AM Ayush Saxena  wrote:
> > >
> > > +1, to start EOL vote for 1.x, and we can keep a doc or a reference in the
> > > Hive Wiki/Website to mark the lines EOL
> > >
> > > Sharing thoughts about the other release lines.
> > > Though there were assertions that we have a lot of users on 2.x & 3.x
> > > lines, I don't think marking these lines as  EOL will impact them that
> > > badly.
> > > Marking a release line seems to be a Dev agreement that we as the
> > > developers aren't putting enough efforts now maintaining these branches 
> > > and
> > > they aren't very up to date.
> > >
> > > Quoting the example from Hadoop. Hadoop 3.1.x line is marked as EOL and
> > > still almost every second person on Hadoop 3.x line is on a heavily 
> > > patched
> > > version of 3.1.x, and from the other half still a bunch of them are on 2.x
> > > family, out of which only 2.10.x isn't EOL. Side note: As of today Hive in
> > > master branch also depends on an unstable EOL version of hadoop, that is
> > > 3.1.0(Upgrade in progress)
> > >
> > > From the stability point of view, I agree with Stamatis that 4.x in alpha
> > > stage is still better than a bunch of previous releases in many aspects,
> > > and supporting older releases will just slow down the chances of
> > > adaptability of the new 4.x.
> > > If we see the git history even of these old branches, the frequency of
> > > commits are even too low, so I don't think most of the
> > > developers/committers aren't putting efforts maintaining these
> > > branches.(Subjective Opinion)
> > >
> > > IMO, We should consider marking 1.x & 2.x as EOL, Resolve upgrade issues
> > > mentioned for 3.x->4.x and once resolved, if that doesn't require any
> > > changes on 3.x line and everyone is happy then mark that even as EOL or
> > > else have a last bridge release for this branch to move to 4.x
> > >
> > > Just my 2 cents.
> > >
> > > -Ayush
> > >
> > >
> > >
> > > On Mon, 25 Jul 2022 at 19:38, Stamatis Zampetakis  
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > In the last exchanges there was a general consensus to EOL Hive 1.X but 
> > > > no
> > > > additional action.
> > > > I believe the next step would be to start a VOTE and move forward with 
> > > > an
> > > > official announcement.
> > > >
> > > > I think it would be helpful for the end-users to know which releases are
> > > > supported and which are strongly discouraged.
> > > > The Hadoop community keeps this information in their wiki [1].
> > > >
> > > > Although, I am still not convinced that we should encourage users to use
> > > > the older release lines (2.X, 3.X) we can postpone the decision for the
> > > > time being and proceed just for 1.X.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> > > > https://cwiki.

Re: 4.0 documentation - Confluence limitations?

2024-01-08 Thread Stamatis Zampetakis
Hey Zsolt,

There have been a few discussions in the past about moving the
documentation from the wiki to the website and from what I recall
people were more or less in favor of moving towards this direction.
The main thing missing is volunteers that are willing to take on this
migration step.

Personally, I am very much in favor of going into this direction not
only for solving namespacing issues but also for traceability purposes
and facilitating doc contributions and reviews.

Big +1 from me.

Best,
Stamatis

On Mon, Jan 8, 2024 at 10:15 AM Zsolt Miskolczi
 wrote:
>
> In confluence, page names should be unique in a given space. As I see,
> Apache Hive has its own space.
> And now comes the tricky part: with 4.0 documentation, we didn't create a
> new space, just a 4.0 parent page. We create a copy of existing pages under
> the umbrella of this page:
> https://cwiki.apache.org/confluence/display/Hive/Apache+Hive+4.0.0
>
> The problem is the unique naming of pages: it would make sense to keep the
> page names the same as in the older documents but unfortunately, we cannot.
> So we try to create names that are almost the same, or just delay the
> decisions.
> Two examples:
> - AdminManual Installation
> 
> became Manual Installation
> 
> - Hive Schema Tool
> became Copy
> of Hive Schema Tool - [TODO: move it under a 4.0 admin manual page, find a
> proper name]
> 
>
> I feel multiple issues with that: Consistency is gone. And also, I'm not
> sure how it can support search engines. Also, it can be confusing for
> people who want to use the wiki pages.
>
> I was thinking about different solutions. Creating a Hive 4.0 space in
> Confluence can solve the problem of page uniqueness. But doesn't address
> the issue of searchability and ease of use.
>
> We can also keep the current one but in that case, it would be recommended
> to figure out a great naming convention about the pages.
>
> At this point, my best idea is to move to an engine that has better offers
> to document a software product. For example, Iceberg uses Hugo. It is a
> markup-based engine, it can be kept in source control and pretty fast.
> Example page: https://iceberg.apache.org/docs/1.4.1/.
>
>
> What do you think of that?
>
> Thank you,
> Zsolt


Re: Force coding style in hive precommit

2024-01-08 Thread Stamatis Zampetakis
+1 for enforcing style on new code. It will definitely save us from
additional review cycles.

Although I like checkstyle I tend to prefer tools that can
automatically apply and fix style violations such as spotless [1].

It seems that the spotless plugin can be configured to enforce
formatting gradually [2] so I think it is an ideal choice for this
discussion.

To avoid wasting CI resources for nothing we can employ spotless (or
other plugins) during the regular build so that detect and fix style
violations fail early on before raising the PR.

Finally, spotless can be configured easily to apply Eclipse styles so
making it use our recommended formatting [3] would be trivial.

Best,
Stamatis

[1] https://github.com/diffplug/spotless
[2] 
https://github.com/diffplug/spotless/tree/main/plugin-maven#how-can-i-enforce-formatting-gradually-aka-ratchet
[3] https://github.com/apache/hive/blob/master/dev-support/eclipse-styles.xml

On Mon, Jan 8, 2024 at 11:06 AM Zsolt Miskolczi
 wrote:
>
> I think giving a warning is something that nobody will check. It could only
> make sense if it is formatted in a way that it cannot be overseen. In every
> other case, it is just ignored. And also, we are already full of warnings
> so I'm afraid it can just hide in the noise.
> Sorry, I don't know how it works in hadoop/tez, maybe it is easy to use.
>
> Ayush Saxena  ezt írta (időpont: 2024. jan. 8., H,
> 9:53):
>
> > +1, to have a checkstyle build. I am strongly against doing that big
> > refactor to make just checkstyle happy, such a refactor will make
> > backports to Hive lower branches tough and the life of folks
> > maintaining downstream forks quite painful.
> >
> > We should enforce same kind of stuff like in Tez/Hadoop, where
> > checkstyle violations are highlighted and the committer before
> > committing can check that & decide whether that in unavoidable or not
> >
> > -Ayush
> >
> > On Mon, 8 Jan 2024 at 14:05, László Bodor 
> > wrote:
> > >
> > > thanks for the responses so far!
> > > I'm a bit against the one-time huge refactor commit as we don't need that
> > > (but I can be convinced of course), because checkstyle can be set up to
> > > warn only on style issues in the new/touched bits in the PR (or at least
> > > that's how it works in tez), that's what we need, so we don't have to
> > make
> > > that huge commit to simply introduce this enforcement
> > >
> > > Butao Zhang  ezt írta (időpont: 2024. jan. 8., H,
> > > 9:28):
> > >
> > > > +1
> > > >
> > > >
> > > >
> > > > BTW, We have a independent checkstyle file under iceberg module
> > > > https://github.com/apache/hive/tree/master/iceberg/checkstyle . I
> > think
> > > > we need to consider unifing the checkstyle in all the sub-module.
> > > >
> > > >
> > > > Thanks,
> > > > Butao Zhang
> > > >  Replied Message 
> > > > | From | Zsolt Miskolczi |
> > > > | Date | 1/8/2024 16:19 |
> > > > | To |  |
> > > > | Subject | Re: Force coding style in hive precommit |
> > > > +1
> > > >
> > > > In case there is an agreement about the coding style, we can prepare a
> > tool
> > > > that enforces that style at compile time. Run a tool one time to
> > re-format
> > > > all the existing code once. And turn on a compile time check. Iceberg
> > did
> > > > the same approach, they had one huge commit with almost 4k files
> > changed
> > > > and from that point, it worked well. And there are no issues about
> > > > formatting.
> > > > I don't think putting a warning message helps at all. Also, it should
> > be
> > > > enforced on compile time.
> > > >
> > > > Zsolt
> > > >
> > > > Kirti Ruge  ezt írta (időpont: 2024. jan. 8.,
> > H,
> > > > 7:20):
> > > >
> > > > +1
> > > > As it would improve maintainability and code reviews. Sometimes small
> > > > indentation/styling issues would kill review cycle time and we can
> > easily
> > > > avoid it before requesting review.
> > > > Enforcing more rules around it definitely boost guaranteeing quality.
> > We
> > > > can integrate it with git hooks. If we are going for this, I can work
> > on
> > > > getting it in place .
> > > >
> > > > Thanks,
> > > > Kirti
> > > >
> > > > On 08-Jan-2024, at 11:36 AM, Akshat m  wrote:
> > > >
> > > > +1, We do have a documentation round it as well:
> > > >
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-CodingConventions
> > > > so it makes sense to enforce it as well.
> > > >
> > > > Right now we have a small section around this in documentation, We can
> > > > also
> > > > expand this to a new page and add more Java practices to it as well
> > which
> > > > are followed in the project while we are at this, Will be a great
> > > > addition
> > > > to Hive 4 documentation, I can pick it up.
> > > >
> > > > I suggest we add this style check as a pre-commit git hook as well, so
> > it
> > > > is enforced when the author is committing locally as well, this can
> > save
> > > > the wait time for pre-commit failure in the PR for the author 

Potential bugs in Vectorized map joins with dates

2023-12-07 Thread Stamatis Zampetakis
Hi all,

>From a quick code review, I get the impression that we may have
various runtime bugs around vectorized map joins on dates.

One problem is reported under HIVE-27943 but I suspect there are more
waiting to be found. For those searching to contribute to the project,
this area shouldn't be too difficult to repro, log, and fix.

Even without a fix, just reporting a problem and logging a bug is a
very useful contribution.

I will be happy to help/review your work and fix those issues (if found).

Best,
Stamatis


Re: [ANNOUNCE] New committer: Butao Zhang (zhangbutao)

2023-11-21 Thread Stamatis Zampetakis
Congratulations Butao, well deserved! Very glad to see another Iceberg
expert joining the team.

Best,
Stamatis


On Tue, Nov 21, 2023, 4:47 PM Butao Zhang  wrote:

> Thank you to the Hive community for this honor. I will continue to
> contribute to the community with my efforts.
> Thanks all!
>
>
> Thanks,
> Butao Zhang
>  Replied Message 
> | From | Ayush Saxena |
> | Date | 11/21/2023 15:02 |
> | To | dev ,
>  ,
> Butao Zhang |
> | Subject | [ANNOUNCE] New committer: Butao Zhang (zhangbutao) |
> Hi All,
> Apache Hive's Project Management Committee (PMC) has invited Butao
> Zhang  to become a committer, and we are pleased to announce that he
> has accepted.
>
> Butao Zhang welcome, thank you for your contributions, and we look
> forward to your further interactions with the community!
>
> Ayush Saxena
> (On behalf of Apache Hive PMC)
>


Re: Discussion about HIVE-12679 to make IMetaStoreClient pluggable

2023-10-19 Thread Stamatis Zampetakis
Hey Okumin,

Thanks for picking up this ticket and driving it forward.

I don't have a strong opinion between the two options.

On the surface the factory option seems simpler and possibly more
efficient but I am not sure if the changes under the PR are sufficient
to cover all usages in Hive.

On the other hand, the proxy option looks more cumbersome to configure
but maybe it is easier to integrate with the existing plumbing of
RetryingMetaStoreClient in various places.

Best,
Stamatis


On Mon, Oct 16, 2023 at 11:00 AM Attila Turoczy
 wrote:
>
> Hi Okumin,
>
> I love this initiative. Especially every good platform should be pluggable.
> In my mind the HMS should be just one option that the user can choose from.
> Yes, that will be the default, but the world is way more open now, and we
> need to provide the choice of freedom. If you or others want to choose a
> different megastore it should be easy.
>
> Both option1 and option2 are acceptable. (Maybe the first one is easier,
> just need another factory, which are so boring :) )
>
> Thank you for your PR and work. I will also check it soon.
>
> -Attila
>
> On Fri, Oct 13, 2023 at 5:04 PM Okumin  wrote:
>
> > Hi,
> >
> > I'm working on introducing a feature to make IMetaStoreClient pluggable.
> > I'm sending this e-mail to gather opinions in a visible manner because it
> > has controversial points.
> >
> > Some Hive users need the feature in order to integrate Hive with a data
> > catalog other than HMS. Although the original patch was submitted more than
> > 7 years ago and many users have wanted it, it has not been merged yet.
> > I revived the ticket and PR so that we can maintain or improve it within
> > the community.
> >
> > - JIRA: https://issues.apache.org/jira/browse/HIVE-12679
> > - PR: https://github.com/apache/hive/pull/
> >
> > I initially created the above PR based on the original design. That's
> > because I think it is reasonable enough and I can see some users have
> > already ported the patch for the past 7 years. But there are also other
> > opinions to suggest other designs. This is a summary for easy catch-up.
> >
> > https://gist.github.com/okumin/30b058b14db1b099ba37ba7dc257fe8e
> >
> > If you are interested in this problem and you have any opinions, please put
> > a comment on the Pull Request.
> >
> > Regards,
> > Okumin
> >


Re: Hive build environment is unstable

2023-10-04 Thread Stamatis Zampetakis
For HIVE-27695, I am in the middle of the investigation since the
problem is reproable locally.

For other problems, I agree with whatever Laszlo said ;)

Best,
Stamatis

On Wed, Oct 4, 2023 at 10:58 AM László Bodor  wrote:
>
> Hey!
>
> We need to investigate with http://ci.hive.apache.org/job/hive-flaky-check/
> and fix it, there is no other way.
> Recently we made HIVE-27717
> <https://issues.apache.org/jira/browse/HIVE-27717> about improving
> precommit tests, but Zoltan is right about that we can improve the flaky
> check itself first to give better logs, so any tickets under HIVE-27717
> <https://issues.apache.org/jira/browse/HIVE-27717> should be considered
> about flaky check job (too).
>
> I can see that the pipeline script of flaky check
> <http://ci.hive.apache.org/job/hive-flaky-check/configure> is in Jenkins
> and I cannot see/recall any seed job that generates this test job, so we
> should consider adding this to the hive repo - similarly to the main
> precommit job's Jenkinsfile
> <https://github.com/apache/hive/blob/master/Jenkinsfile> - if possible.
>
> Regards,
> Laszlo Bodor
>
> Stamatis Zampetakis  ezt írta (időpont: 2023. okt. 4.,
> Sze, 10:53):
>
> > Hello,
> >
> > I am looking into this as part of HIVE-27695. Till we commit the fix
> > feel free to ignore the tests that are failing with OOM related errors
> > in TestMiniTezCliDriver.
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-27695
> >
> > On Wed, Oct 4, 2023 at 9:44 AM Zoltán Rátkai
> >  wrote:
> > >
> > > Hi Everyone,
> > >
> > > the hive build environment is really unstable lately. I get many times
> > > exceptions like this, or test just fail randomly, then next time another:
> > > * java.lang.OutOfMemoryError: GC overhead limit exceeded*
> > >
> > > see here:
> > >
> > http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4690/16/tests/
> > >
> > > What can we do to avoid this? It takes a lot of effort to always
> > retrigger
> > > the build and hope next time it succeeds.
> > >
> > > Regards,
> > >
> > > Zoltán Rátkai
> >


Re: Hive build environment is unstable

2023-10-04 Thread Stamatis Zampetakis
Hello,

I am looking into this as part of HIVE-27695. Till we commit the fix
feel free to ignore the tests that are failing with OOM related errors
in TestMiniTezCliDriver.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-27695

On Wed, Oct 4, 2023 at 9:44 AM Zoltán Rátkai
 wrote:
>
> Hi Everyone,
>
> the hive build environment is really unstable lately. I get many times
> exceptions like this, or test just fail randomly, then next time another:
> * java.lang.OutOfMemoryError: GC overhead limit exceeded*
>
> see here:
> http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-4690/16/tests/
>
> What can we do to avoid this? It takes a lot of effort to always retrigger
> the build and hope next time it succeeds.
>
> Regards,
>
> Zoltán Rátkai


[ANNOUNCE] New committer: Sourabh Badhya

2023-10-03 Thread Stamatis Zampetakis
Apache Hive's Project Management Committee (PMC) has invited Sourabh
Badhya to become a committer, and we are pleased to announce that he
has accepted.

Sourabh has been doing some great work for the project. He has landed
important fixes in critical parts of Hive and made significant
contributions to the stabilization of ACID compactions, Direct Write
functionality, and Iceberg support. Apart from code contributions,
Sourabh has been regularly reviewing others' work and providing
valuable feedback as well as testing and validating releases.

Sourabh, welcome, thank you for your contributions, and we look
forward to your further interactions with the community! If you wish,
please feel free to tell us more about yourself and what you are
working on.

Stamatis (on behalf of the Apache Hive PMC)


Re: CVE reports and process to completion

2023-09-19 Thread Stamatis Zampetakis
Many thanks to Ayush for volunteering! Anyone else?

Note that handling vulnerabilities is of utmost importance to an
Apache project. It is one of the four technical requirements
established by ASF [1]. If there are not enough PMC members to handle
CVEs the project can be taken down.

Best,
Stamatis

[1] https://www.apache.org/dev/project-requirements#technical

On Wed, Sep 13, 2023 at 11:11 AM Ayush Saxena  wrote:
>
> Hi Stamatis,
> Thanx for starting the thread, I can volunteer as well.
>
> -Ayush
>
> On Tue, 12 Sept 2023 at 13:43, Stamatis Zampetakis  wrote:
> >
> > Hey everyone,
> >
> > When someone discovers a potential security vulnerability for Hive (or
> > any other Apache project) they can opt to inform the PMC of the
> > project by following the ASF guidelines [1]. For Hive, the report
> > should be sent to secur...@hive.apache.org.
> >
> > Next, the PMC follows the steps outlined in [2] to process the report
> > and if it is deemed necessary release a fix for the vulnerability.
> >
> > In order to make the CVE process as smooth as possible and ensure that
> > CVE reports are addressed in a timely manner I would like to introduce
> > the notion of a "CVE mentor".
> >
> > The "CVE mentor" is the one responsible for bringing the reported CVE
> > to completion ensuring that the steps in [2] are followed. They are
> > the principal contact person between the reporter of the vulnerability
> > and the PMC and the one who leads the discussions. The triage and fix
> > can be done by the mentor or entrusted to a committer (ensuring of
> > course that everything remains private till a fix is officially
> > released). Given that we need to release a fix very soon after a
> > vulnerability is fixed the mentor may also need to act as the release
> > manager. Since the reports arrive in the private list the CVE mentor
> > should be someone that has access to the security list (all PMC and
> > few other individuals).
> >
> > However, for the idea to work we need a few people (preferably PMC) to
> > volunteer for the role of the "CVE mentor". Then the volunteers can
> > pick incoming CVE reports in a round robin fashion. Needless to say
> > that since I am the one proposing it, I would like to be part of the
> > list.
> >
> > Any additional thoughts or suggestions on how to improve this process
> > are very welcomed. Also if you like the idea and want to volunteer
> > please reply to this email to add yourself to the list.
> >
> > Best,
> > Stamatis Zampetakis
> >
> > [1] https://www.apache.org/security/
> > [2] https://www.apache.org/security/committers.html#possible


Re: Alternatives to dependency on hive-exec?

2023-09-19 Thread Stamatis Zampetakis
Hey Chris,

Keep in mind that the core jar was removed some time ago [1, 2] so in
new releases (4.0.0 onwards) it will not be there.
I am not sure what integration you are trying to establish but it
would be definitely easier if you opt for something lighter like the
JDBC API and the Hive JDBC driver.

Shading in hive-exec is a real pain point [3] but not an easy one to get rid of.

Best,
Stamatis

[1] https://lists.apache.org/thread/yld75ltf9y8d9q3cow3xqlg0fqyj6mkg
[2] https://issues.apache.org/jira/browse/HIVE-25531
[3] https://issues.apache.org/jira/browse/HIVE-26220

On Tue, Sep 19, 2023 at 10:17 AM Christofer Dutz
 wrote:
>
> Hi all,
>
> ok … so it seems StackOverflow’s my friend.
> Seems adding a classifier of “core” to the dependency gets me an unshaded 
> version.
>
> Chris
>
> Von: Christofer Dutz 
> Datum: Dienstag, 19. September 2023 um 09:19
> An: dev@hive.apache.org 
> Betreff: Alternatives to dependency on hive-exec?
> Hi all,
>
> I’m currently trying to manage all dependencies in the Apache IoTDB project.
> Here for the hive integration, a dependency is used to hive-exec.
> Unfortunately, this simply seems to be a big fat jar of all sorts of 
> dependencies, that we also use separately.
> This results in all sorts of dependencies being available twice and I would 
> love to eliminate this.
>
> Do you have any suggestions to how I could work without the hive-exec 
> dependency? I have no problem with adding 10 dependencies instead.
> However, I spotted some shaded classes in org.apache.hive … are these 
> available outside the hive-exec dependency?
>
> Chris


Re: Release managers

2023-09-14 Thread Stamatis Zampetakis
For tracking purposes, I don't think we need something very elaborate.
A simple JIRA filter that includes priority (Blocker), target version
(4.0.0), and status (open) should be enough:

project =Hive and priority = Blocker and cf[12310320] = 4.0.0 and status = open

Best,
Stamatis

On Wed, Sep 13, 2023 at 6:27 PM Naveen Gangam
 wrote:
>
> Thank you guys. How do we plan to track the items for these releases? JIRA
> boards?
>
> On Tue, Sep 12, 2023 at 1:49 AM Ayush Saxena  wrote:
>
> > I can volunteer for one as well, I plan to create a wiki page around
> > the release management stuff most specifically where we can update the
> > Release Managers & tentative release dates & planned features,
> > blockers for those releases.
> >
> > Will add a page around basic release validation as well, so that we
> > can have more volunteers to validate the RC during the release time
> > :-)
> >
> > -Ayush
> >
> > On Thu, 23 Mar 2023 at 21:09, Sai Hemanth Gantasala
> >  wrote:
> > >
> > > Hi all,
> > >
> > > I would like to volunteer for the 4.2.0 release.
> > >
> > > Thanks,
> > > Sai.
> > >
> > > On Thu, Mar 23, 2023 at 2:47 PM Denys Kuzmenko 
> > wrote:
> > >
> > > > Hi, I can take the following one: 4.1.0
> > > >
> >


CVE reports and process to completion

2023-09-12 Thread Stamatis Zampetakis
Hey everyone,

When someone discovers a potential security vulnerability for Hive (or
any other Apache project) they can opt to inform the PMC of the
project by following the ASF guidelines [1]. For Hive, the report
should be sent to secur...@hive.apache.org.

Next, the PMC follows the steps outlined in [2] to process the report
and if it is deemed necessary release a fix for the vulnerability.

In order to make the CVE process as smooth as possible and ensure that
CVE reports are addressed in a timely manner I would like to introduce
the notion of a "CVE mentor".

The "CVE mentor" is the one responsible for bringing the reported CVE
to completion ensuring that the steps in [2] are followed. They are
the principal contact person between the reporter of the vulnerability
and the PMC and the one who leads the discussions. The triage and fix
can be done by the mentor or entrusted to a committer (ensuring of
course that everything remains private till a fix is officially
released). Given that we need to release a fix very soon after a
vulnerability is fixed the mentor may also need to act as the release
manager. Since the reports arrive in the private list the CVE mentor
should be someone that has access to the security list (all PMC and
few other individuals).

However, for the idea to work we need a few people (preferably PMC) to
volunteer for the role of the "CVE mentor". Then the volunteers can
pick incoming CVE reports in a round robin fashion. Needless to say
that since I am the one proposing it, I would like to be part of the
list.

Any additional thoughts or suggestions on how to improve this process
are very welcomed. Also if you like the idea and want to volunteer
please reply to this email to add yourself to the list.

Best,
Stamatis Zampetakis

[1] https://www.apache.org/security/
[2] https://www.apache.org/security/committers.html#possible


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2023-09-06 Thread Stamatis Zampetakis
Based on the discussion under LEGAL-653, it seems that the only
requirement to migrate the repos under the Apache namespace is to
apply the AL2 license in the majority of the files in there.

I am looking for volunteers so that we can review the existing code in
those repo and apply the AL2 license where possible. Depending on how
many people step up we can divide the work accordingly.

It would be interesting to see if we can use RAT [1] to
automate/assist  in this process.

Best,
Stamatis

[1] https://creadur.apache.org/rat/apache-rat-plugin/usage.html

On Thu, Aug 24, 2023 at 11:05 AM Stamatis Zampetakis  wrote:
>
> For the licensing question, I just created LEGAL-653 [1].
>
> [1] https://issues.apache.org/jira/browse/LEGAL-653
>
> On Thu, Aug 24, 2023 at 11:55 AM Stamatis Zampetakis  
> wrote:
> >
> > Creating the new repos should be kind of trivial. I think I will be
> > able to do it using https://selfserve.apache.org/.
> >
> > Since this will bring quite a bit of code under the ASF I will wait a
> > couple of days till I create the new repos.
> >
> > Once this is done, I think we can simply push the content from the old
> > repos to the new ones. To avoid any kind of IP problems it would be
> > best if Zoltan does this step.
> >
> > One thing that we may need to be careful about is the licensing of
> > these repos. We are not going to make source releases from there but
> > since the code will be under the ASF namespace people will assume that
> > it is ASF licensed so they may start copy-pasting stuff from there.
> >
> > Is there anything preventing us from putting the code under the AL2 license?
> >
> > Best,
> > Stamatis
> >
> > On Wed, Aug 23, 2023 at 6:14 PM Attila Turoczy
> >  wrote:
> > >
> > > Thank you, Stamatis! Also, Zoltan for the "donation" :)
> > >
> > > -Attila
> > >
> > > On Wed, Aug 23, 2023 at 4:53 PM Ayush Saxena  wrote:
> > >
> > > > +1,
> > > > Thanx Stamatis foe initiating this. This was something which was in my
> > > > mind as well since long but couldn’t find time.
> > > >
> > > > -Ayush
> > > >
> > > > > On 23-Aug-2023, at 6:19 PM, Zoltan Haindrich  wrote:
> > > > >
> > > > > Hey Stamatis!
> > > > >
> > > > > I'm happy to donate these repos / help with the migration!
> > > > > I should have done it earlier - but it was never top priority...thank
> > > > you for initiating it!
> > > > >
> > > > > cheers,
> > > > > Zoltan
> > > > >
> > > > >> On 8/23/23 14:00, Stamatis Zampetakis wrote:
> > > > >> Hi all,
> > > > >> Our precommit infrastructure uses code that resides in the following
> > > > repos.
> > > > >> * https://github.com/kgyrtkirk/hive-test-kube
> > > > >> * https://github.com/kgyrtkirk/hive-toolbox
> > > > >> * https://github.com/kgyrtkirk/hive-dev-box
> > > > >> These are mainly maintained by Zoltán Haindrich who is always helpful
> > > > >> and kind to investigate and resolve issues.
> > > > >> For facilitating contributions from the apache community and also
> > > > >> removing some burden from Zoltan's shoulders it may be a good time to
> > > > >> migrate those and put them under the apache namespace.
> > > > >> For the initial migration, we could have a straightforward 1 to 1
> > > > >> mapping as shown below:
> > > > >> * https://github.com/apache/hive-test-kube
> > > > >> * https://github.com/apache/hive-toolbox
> > > > >> * https://github.com/apache/hive-dev-box
> > > > >> How do you feel about this?
> > > > >> Best,
> > > > >> Stamatis
> > > >


Re: Include ARM binaries with next release

2023-08-25 Thread Stamatis Zampetakis
Hey Ayush,

I just wanted to highlight that the vote applies to all released
artifacts, not only the source packages. The source package is of
course the primary and most important deliverable but the PMC is
responsible for everything under downloads.apache.org and similar
places. Any additional binaries will need to be verified by the PMC to
ensure that there are no violations of the ASF policy.

While I was working on preparing the 4.0.0-beta-1 it took me quite a
bit of time to ensure that our convenience binaries comply with the
ASF guidelines and I am not yet 100% sure that I covered everything. I
would be more eager to drop the existing convenience binaries rather
than introducing more.

The additional binaries would also put additional strain on the ASF servers.

I see the benefits for ARM binaries but I would prefer to keep
releases simple and let those who are interested in those build them
themselves. We can do whatever we can to facilitate the build process
of such binaries but not necessarily deliver and host them ourselves.

I am somewhere -0 for this. I am not gonna vote against the idea but
not supporting it either.

Best,
Stamatis


On Fri, Aug 25, 2023 at 4:48 PM Attila Turoczy
 wrote:
>
> Love it! In 2023 where ARM became an industrial standard. Also ARM perform
> very well plus the cloud arm vm's are so much cheaper.
>
> -Attila
>
> On 2023. Aug 25., Fri at 12:48, Ayush Saxena  wrote:
>
> > Hi All,
> > Considering now we do support building Hive on both x86 & ARM, can we
> > explore having additional binaries built for ARM architecture?
> >
> > A lot of projects do release both x86 & ARM binaries example hadoop
> > [1], can check the Binary Download column in the 3.3.6 row
> >
> > As for the process, the release vote is on the source code, which
> > stays the same for both x86 & ARM. It is just an additional
> > convenience binary built, signed & released. We can consider making
> > this step optional as well.
> >
> > Let me know what people think!!!
> >
> > -Ayush
> >
> > [1] https://hadoop.apache.org/releases.html
> >


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2023-08-24 Thread Stamatis Zampetakis
For the licensing question, I just created LEGAL-653 [1].

[1] https://issues.apache.org/jira/browse/LEGAL-653

On Thu, Aug 24, 2023 at 11:55 AM Stamatis Zampetakis  wrote:
>
> Creating the new repos should be kind of trivial. I think I will be
> able to do it using https://selfserve.apache.org/.
>
> Since this will bring quite a bit of code under the ASF I will wait a
> couple of days till I create the new repos.
>
> Once this is done, I think we can simply push the content from the old
> repos to the new ones. To avoid any kind of IP problems it would be
> best if Zoltan does this step.
>
> One thing that we may need to be careful about is the licensing of
> these repos. We are not going to make source releases from there but
> since the code will be under the ASF namespace people will assume that
> it is ASF licensed so they may start copy-pasting stuff from there.
>
> Is there anything preventing us from putting the code under the AL2 license?
>
> Best,
> Stamatis
>
> On Wed, Aug 23, 2023 at 6:14 PM Attila Turoczy
>  wrote:
> >
> > Thank you, Stamatis! Also, Zoltan for the "donation" :)
> >
> > -Attila
> >
> > On Wed, Aug 23, 2023 at 4:53 PM Ayush Saxena  wrote:
> >
> > > +1,
> > > Thanx Stamatis foe initiating this. This was something which was in my
> > > mind as well since long but couldn’t find time.
> > >
> > > -Ayush
> > >
> > > > On 23-Aug-2023, at 6:19 PM, Zoltan Haindrich  wrote:
> > > >
> > > > Hey Stamatis!
> > > >
> > > > I'm happy to donate these repos / help with the migration!
> > > > I should have done it earlier - but it was never top priority...thank
> > > you for initiating it!
> > > >
> > > > cheers,
> > > > Zoltan
> > > >
> > > >> On 8/23/23 14:00, Stamatis Zampetakis wrote:
> > > >> Hi all,
> > > >> Our precommit infrastructure uses code that resides in the following
> > > repos.
> > > >> * https://github.com/kgyrtkirk/hive-test-kube
> > > >> * https://github.com/kgyrtkirk/hive-toolbox
> > > >> * https://github.com/kgyrtkirk/hive-dev-box
> > > >> These are mainly maintained by Zoltán Haindrich who is always helpful
> > > >> and kind to investigate and resolve issues.
> > > >> For facilitating contributions from the apache community and also
> > > >> removing some burden from Zoltan's shoulders it may be a good time to
> > > >> migrate those and put them under the apache namespace.
> > > >> For the initial migration, we could have a straightforward 1 to 1
> > > >> mapping as shown below:
> > > >> * https://github.com/apache/hive-test-kube
> > > >> * https://github.com/apache/hive-toolbox
> > > >> * https://github.com/apache/hive-dev-box
> > > >> How do you feel about this?
> > > >> Best,
> > > >> Stamatis
> > >


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2023-08-24 Thread Stamatis Zampetakis
Creating the new repos should be kind of trivial. I think I will be
able to do it using https://selfserve.apache.org/.

Since this will bring quite a bit of code under the ASF I will wait a
couple of days till I create the new repos.

Once this is done, I think we can simply push the content from the old
repos to the new ones. To avoid any kind of IP problems it would be
best if Zoltan does this step.

One thing that we may need to be careful about is the licensing of
these repos. We are not going to make source releases from there but
since the code will be under the ASF namespace people will assume that
it is ASF licensed so they may start copy-pasting stuff from there.

Is there anything preventing us from putting the code under the AL2 license?

Best,
Stamatis

On Wed, Aug 23, 2023 at 6:14 PM Attila Turoczy
 wrote:
>
> Thank you, Stamatis! Also, Zoltan for the "donation" :)
>
> -Attila
>
> On Wed, Aug 23, 2023 at 4:53 PM Ayush Saxena  wrote:
>
> > +1,
> > Thanx Stamatis foe initiating this. This was something which was in my
> > mind as well since long but couldn’t find time.
> >
> > -Ayush
> >
> > > On 23-Aug-2023, at 6:19 PM, Zoltan Haindrich  wrote:
> > >
> > > Hey Stamatis!
> > >
> > > I'm happy to donate these repos / help with the migration!
> > > I should have done it earlier - but it was never top priority...thank
> > you for initiating it!
> > >
> > > cheers,
> > > Zoltan
> > >
> > >> On 8/23/23 14:00, Stamatis Zampetakis wrote:
> > >> Hi all,
> > >> Our precommit infrastructure uses code that resides in the following
> > repos.
> > >> * https://github.com/kgyrtkirk/hive-test-kube
> > >> * https://github.com/kgyrtkirk/hive-toolbox
> > >> * https://github.com/kgyrtkirk/hive-dev-box
> > >> These are mainly maintained by Zoltán Haindrich who is always helpful
> > >> and kind to investigate and resolve issues.
> > >> For facilitating contributions from the apache community and also
> > >> removing some burden from Zoltan's shoulders it may be a good time to
> > >> migrate those and put them under the apache namespace.
> > >> For the initial migration, we could have a straightforward 1 to 1
> > >> mapping as shown below:
> > >> * https://github.com/apache/hive-test-kube
> > >> * https://github.com/apache/hive-toolbox
> > >> * https://github.com/apache/hive-dev-box
> > >> How do you feel about this?
> > >> Best,
> > >> Stamatis
> >


[DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2023-08-23 Thread Stamatis Zampetakis
Hi all,

Our precommit infrastructure uses code that resides in the following repos.

* https://github.com/kgyrtkirk/hive-test-kube
* https://github.com/kgyrtkirk/hive-toolbox
* https://github.com/kgyrtkirk/hive-dev-box

These are mainly maintained by Zoltán Haindrich who is always helpful
and kind to investigate and resolve issues.

For facilitating contributions from the apache community and also
removing some burden from Zoltan's shoulders it may be a good time to
migrate those and put them under the apache namespace.

For the initial migration, we could have a straightforward 1 to 1
mapping as shown below:

* https://github.com/apache/hive-test-kube
* https://github.com/apache/hive-toolbox
* https://github.com/apache/hive-dev-box

How do you feel about this?

Best,
Stamatis


Re: MiniHS2 and postgresql jars

2023-08-23 Thread Stamatis Zampetakis
Hello,

For those interested about the full background behind the changes that
partially broke StartMiniHS2Cluster please have a look at [1, 2, 3].
If something is not clear or we need to revisit those decisions I am
happy to discuss further.

Summarizing in one sentence, the bin.tar.gz and src.tag.gz release
artifacts should never contain jars or code which have licenses in
Category-X [4]. If we want to use or rely on such code things are a
bit more flexible as described in [4].

Best,
Stamatis

[1] https://lists.apache.org/thread/xd25nhox103t2zj52lnzbjkm6d41ls94
[2] https://issues.apache.org/jira/browse/HIVE-25701
[3] https://issues.apache.org/jira/browse/HIVE-27338
[4] https://www.apache.org/legal/resolved.html#category-x

On Wed, Aug 23, 2023 at 11:44 AM Zsolt Miskolczi
 wrote:
>
> As I understand, they have been removed due to licensing issues. Before we
> re-add the jar or add a possibility to include jars into the classpath, I
> want to understand the reason. What was the exact licensing issue?
>
> Do we have to make sure that we don't provide those jars in production? Or
> is it necessary for tests as well?
>
> For me, the solution would be something that I, as a developer, have to set
> up one time only. Based on my current knowledge, it seems the ability to
> add external jars to the classpath could be more comfortable if we cannot
> provide the jars into the production.
> The reason is that we need the jdbc driver at two places:
> - for MiniHS2
> - and in the conf directory, next to bin (to be able to easily use it for
> schema tool, beeline and standalone metastore service).
>
>
> On the other hand, I'm not completely sure if MiniHS2 as a test is the most
> comfortable idea. In the long term, I would prefer to replace it with a run
> script/configuration that runs Hive Server in a 'Mini' mode, for local
> development purposes.
>
> Thanks,
> Zsolt
>
>
> Stamatis Zampetakis  ezt írta (időpont: 2023. aug. 22.,
> K, 18:06):
>
> > I am not against restoring itest.jdbc.jars property but for this case
> > I prefer the explicit declaration of the dependency.
> >
> > Adding an optional or test scope dependency is much simpler and works
> > out of the box. We don't need to download jars manually and we don't
> > need to remember how the system property is called in order to run the
> > test.
> >
> > Anyways, we all agree that we want StartMiniHS2Cluster to run on
> > different metastore backends so we can create a JIRA/PR and move this
> > forward. How we are going to do it is implementation details so we can
> > continue the discussion under the respective ticket.
> >
> > Best,
> > Stamatis
> >
> > On Tue, Aug 22, 2023 at 6:46 PM László Bodor 
> > wrote:
> > >
> > > Yeah, I think we should simply re-add the possibility to add jars to
> > > classpath, call it "itest.jdbc.jars" to preserve the old behavior and
> > > parameter name.
> > >
> > > Denys Kuzmenko  ezt írta (időpont: 2023. aug.
> > 22., K,
> > > 12:43):
> > >
> > > > Instead of adding the dependencies, can't we add the possibility to
> > > > include jdbc jars in the classpath?
> > > > something like this:
> > > > 
> > > >   org.apache.maven.plugins
> > > >   maven-failsafe-plugin
> > > >   
> > > > 
> > > >   
> > > > integration-test
> > > > verify
> > > >   
> > > > 
> > > >   
> > > >   
> > > > 
> > > > 
> > > >
> > > >
> > ${itest.jdbc.jars}
> > > > 
> > > >...
> > > >   
> > > > 
> > > >
> >


Re: MiniHS2 and postgresql jars

2023-08-22 Thread Stamatis Zampetakis
I am not against restoring itest.jdbc.jars property but for this case
I prefer the explicit declaration of the dependency.

Adding an optional or test scope dependency is much simpler and works
out of the box. We don't need to download jars manually and we don't
need to remember how the system property is called in order to run the
test.

Anyways, we all agree that we want StartMiniHS2Cluster to run on
different metastore backends so we can create a JIRA/PR and move this
forward. How we are going to do it is implementation details so we can
continue the discussion under the respective ticket.

Best,
Stamatis

On Tue, Aug 22, 2023 at 6:46 PM László Bodor  wrote:
>
> Yeah, I think we should simply re-add the possibility to add jars to
> classpath, call it "itest.jdbc.jars" to preserve the old behavior and
> parameter name.
>
> Denys Kuzmenko  ezt írta (időpont: 2023. aug. 22., K,
> 12:43):
>
> > Instead of adding the dependencies, can't we add the possibility to
> > include jdbc jars in the classpath?
> > something like this:
> > 
> >   org.apache.maven.plugins
> >   maven-failsafe-plugin
> >   
> > 
> >   
> > integration-test
> > verify
> >   
> > 
> >   
> >   
> > 
> > 
> >
> > ${itest.jdbc.jars}
> > 
> >...
> >   
> > 
> >


Re: Liquibase introduction for HMS schema

2023-08-22 Thread Stamatis Zampetakis
Hey Laszlo,

Thanks a lot for pushing this forward. I really love the diff in the
PR that shows 120K lines of code less than before :)

Apache software is not very different from enterprise software so the
same applies to releases as well.

Cutting a new release/branch would mean that we have to support and
maintain it as we do for other releases lines. It also means that it
will cost us infra resources since we will have to enable some kind of
CI if we want to accept PRs and keep things reasonably stable.

If there is a team of people that are willing to drive this effort
then nothing prevents us from having such a release. I would suggest
to find at least 2 PMC members that are willing to drive this effort
(prepare releases, drive CVE discussions, etc.) and a few committers
for reviewing and merging PRs. If there are enough volunteers then I
am fine with moving forward with such a plan.

Personally, I would prefer to see this land in the main master
branch/release rather than dividing our efforts into maintaining
separate branches. Upgrading to the latest release of a software is a
challenge itself so if we have multiple "latest" releases then the
challenge becomes even greater and may interfere a bit with the
adoption.

Best,
Stamatis

On Fri, Aug 18, 2023 at 12:09 PM Laszlo Vegh  wrote:
>
>
> Laszlo Vegh
> lv...@cloudera.com
>
>
> Hi all,
>
> I have a PR (https://github.com/apache/hive/pull/4060) which replaces the 
> current custom HMS schema evolution tool with Liquibase. The PR contains two 
> documents about the possible benefits, and about how to contribute new schema 
> changes in the new ecosystem.
>
> Unfortunately it’s too big, and also contains key infrasrtucture related 
> changes, so it’s not the best idea to merge it directly to master. I’m 
> thinking about creating a separate branch, or dedicated release. It would 
> based on master, and would be in sync with it, but it would also contain the 
> Liquibase introduction PR. With this approach it would be easier to test it, 
> play with it, and merge back to master only when it is considered as stable.
>
> I’m not sure what is the “Apache way” of doing this, is there any 
> recommendation, or approach about creating such versions/releases?
>
> Best regards,
> Laszlo Vegh


Re: MiniHS2 and postgresql jars

2023-08-22 Thread Stamatis Zampetakis
If people are using it then it makes sense to keep it functional.

Feel free to open a PR and add the missing dependency(ies) in the
appropriate place; just ensure that they are not widely propagated if
not necessary.

Best,
Stamatis

On Mon, Aug 21, 2023 at 12:04 PM Zoltán Rátkai
 wrote:
>
> Hi Team,
>
> I agree with Laszlo Bodor, it is a good tool to develop Hive. It worked
> properly before this PR.
> My question is how do you provide the missing database jar (e.g. Postgres
> driver) for StartMiniHS2Cluster, which is run by maven?  If it requires to
> modify the pom file after this change, then there is no difference then
> just alway reverting this PR mentioned earlier, but it has the risk this
> will be committed together with the actual work a developer does. I think
> this is a wrong approach to modify a pom or java file to be able to run it.
> We must be able to run it without alway modifying.
>
> Regards,
>
> Zoltan Ratkai
>
> On Mon, Aug 21, 2023 at 10:44 AM László Bodor 
> wrote:
>
> > Hey!
> >
> > I tend to return to StartMiniHS2Cluster from time to time. It's very good
> > for the "change code - compile - run - debug - repeat" way of doing things.
> > From this point of view, docker image is not an alternative to that. Also,
> > StartMiniHS2Cluster just works, always, moreover, it uses the same
> > minicluster architecture as our qtests (I mean the way it achieves a mini
> > mr/tez/llap/whatever cluster from the hadoop shim). Qtests are also not an
> > alternative: I often need the ability to run queries in random order or in
> > adhoc way without editing a qfile.
> >
> > I feel that HIVE-27338 <https://issues.apache.org/jira/browse/HIVE-27338>
> > was
> > about to solve licensing problems, and it achieved that, but maybe went too
> > far: I believe we should be able to provide jars to the test classpath,
> > e.g. jars that were downloaded beforehand, even manually.
> >
> > Regards,
> > Laszlo Bodor
> >
> >
> > Stamatis Zampetakis  ezt írta (időpont: 2023. aug. 18.,
> > P, 13:08):
> >
> > > Hey Zsolt,
> > >
> > > I would divide this discussion into three topics:
> > >
> > > 1. What are the benefits of using the StartMiniHS2Cluster?
> > > 2. What other alternatives are there for testing HS2 with different
> > > metastore DBMS?
> > > 3. How can we make StartMiniHS2Cluster work as before?
> > >
> > > Regarding the point 1, I don't have an answer because I never used
> > > StartMiniHS2Cluster myself. Obviously, other people here are using it so
> > I
> > > would be curious to know in which cases this is useful.
> > >
> > > There are various alternatives for testing HS2 with different metastore
> > > DBMS.
> > >
> > > The first and in my opinion the easiest to use is the classic qtest
> > > framework with the various CLI drivers. Basically, with the work that
> > > Laszlo started in HIVE-21954 it is pretty easy to run any kind of test
> > over
> > > any metastore just by setting the respective system property.
> > >
> > > mvn test -Dtest=TestMiniLlapLocalCliDriver
> > > -Dqfile=partition_params_postgres.q -Dtest.metastore.db=postgres
> > >
> > > The second is to use our brand new and shiny docker images contributed by
> > > Zhihua in HIVE-26400 and get the real feel of HS2 and HMS in a prod-like
> > > setup. I haven't played much with this in apache/master but I did use
> > some
> > > similar images in our internal forks and it's pretty easy to get it up
> > and
> > > running.
> > >
> > > The third is the old and classic hive-dev-box [1] started by Zoltan. With
> > > two/three commands you have a Hive cluster like environment and of course
> > > you can choose which DBMS you want for the metastore.
> > >
> > > Regarding point 3, I assume that it is pretty easy to fix by adding the
> > > postgresql (or other JDBC driver) dependency inside the hive-it-unit
> > > module.
> > >
> > > 
> > >   org.postgresql
> > >   postgresql
> > >   true
> > > 
> > >
> > > Given that we have other alternatives do we really need to go into this
> > > direction? In fact, do we really need the StartMiniHS2Cluster class?
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] https://github.com/kgyrtkirk/hive-dev-box
> > >
> > > On Tue, Aug 15, 2023, 6:08 PM Zsolt Miskolczi  > >
> > > wrote:
> > >

Re: Admin privileges on http://ci.hive.apache.org/

2023-08-18 Thread Stamatis Zampetakis
Many thanks Zoltan!

I checked and now I have permissions to manage Jenkins. As long as we
have few people with admin privileges that are active in the community
we should be fine.

Best,
Stamatis

On Wed, Aug 16, 2023 at 11:49 AM Zoltan Haindrich  wrote:
>
> Hey,
>
> I think ideally all team members 
> (https://github.com/orgs/apache/teams/hive-committers) should be admins; but 
> I was facing some issue when setting it up initially - or made
> some mistakes...don't remember.
> but there are quite a few people with admin rights who can add you as well 
> (I've already added you).
>
> cheers,
> Zoltan
>
>
> On 8/15/23 17:18, Stamatis Zampetakis wrote:
> > Hey all,
> >
> > I was wondering who has admin privileges for http://ci.hive.apache.org/ ?
> >
> > I would like to check and potentially upgrade some plugins that are
> > currently installed. Is it possible to get permissions to manage the
> > Jenkins instance?
> >
> > As a side note we may want to document the current administrators and the
> > process to get permissions when necessary.
> >
> >
> > Best,
> > Stamatis
> >


Re: MiniHS2 and postgresql jars

2023-08-18 Thread Stamatis Zampetakis
Hey Zsolt,

I would divide this discussion into three topics:

1. What are the benefits of using the StartMiniHS2Cluster?
2. What other alternatives are there for testing HS2 with different
metastore DBMS?
3. How can we make StartMiniHS2Cluster work as before?

Regarding the point 1, I don't have an answer because I never used
StartMiniHS2Cluster myself. Obviously, other people here are using it so I
would be curious to know in which cases this is useful.

There are various alternatives for testing HS2 with different metastore
DBMS.

The first and in my opinion the easiest to use is the classic qtest
framework with the various CLI drivers. Basically, with the work that
Laszlo started in HIVE-21954 it is pretty easy to run any kind of test over
any metastore just by setting the respective system property.

mvn test -Dtest=TestMiniLlapLocalCliDriver
-Dqfile=partition_params_postgres.q -Dtest.metastore.db=postgres

The second is to use our brand new and shiny docker images contributed by
Zhihua in HIVE-26400 and get the real feel of HS2 and HMS in a prod-like
setup. I haven't played much with this in apache/master but I did use some
similar images in our internal forks and it's pretty easy to get it up and
running.

The third is the old and classic hive-dev-box [1] started by Zoltan. With
two/three commands you have a Hive cluster like environment and of course
you can choose which DBMS you want for the metastore.

Regarding point 3, I assume that it is pretty easy to fix by adding the
postgresql (or other JDBC driver) dependency inside the hive-it-unit module.


  org.postgresql
  postgresql
  true


Given that we have other alternatives do we really need to go into this
direction? In fact, do we really need the StartMiniHS2Cluster class?

Best,
Stamatis

[1] https://github.com/kgyrtkirk/hive-dev-box

On Tue, Aug 15, 2023, 6:08 PM Zsolt Miskolczi 
wrote:

> Hey there!Do you know how it is possible to use minihs2 with embedded
> metastore service but postgresql as the metastore database?
>
> Since I'm pretty sure it broke with that change
> https://github.com/apache/hive/pull/4317, handling drivers is changed:
>
> We don't bundle postgresql drivers with the hive build.Firstly, when I copy
> the postgresql driver jar into the
>
> packaging/target/apache-hive-4.0.0-beta-1-SNAPSHOT-bin/apache-hive-4.0.0-beta-1-SNAPSHOT-bin/lib
> directory, I can run the schematool as it sees the driver.
> The real issue is with miniHS2Cluster: it is basically a maven test and I
> have really no idea how I can pass the driver to the test. If it would be a
> simple java command, I would try to pass the -cp argument but I see no
> other way to add extra jar file to the test execution.I want to run minihs2
> with this command: mvn test -Dtest=StartMiniHS2Cluster -DminiHS2.run=true
> -DminiHS2.usePortsFromConf=true -DminiHS2.clusterType=LLAP
> -DminiHS2.conf="../../data/conf/llap/hive-site.xml"
> -Dpackaging.minimizeJar=false -DskipShade -Dremoteresources.skip=true
> -Dmaven.javadoc.skip=true -Denforcer.skip=trueAnd get that error (it seems
> the metastore client cannot find the proper driver):
>
> java.lang.RuntimeException: Error applying authorization policy on hive
> configuration: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> at org.apache.hive.service.cli.CLIService.init(CLIService.java:122)
> at org.apache.hive.service.CompositeService.init(CompositeService.java:59)
> at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:243)
> at org.apache.hive.jdbc.miniHS2.MiniHS2.start(MiniHS2.java:394)
> at
>
> org.apache.hive.jdbc.miniHS2.StartMiniHS2Cluster.testRunCluster(StartMiniHS2Cluster.java:78)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
>
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> at
>
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
>
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> at
>
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at
>
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> at
>
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> at
>
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at 

Admin privileges on http://ci.hive.apache.org/

2023-08-15 Thread Stamatis Zampetakis
Hey all,

I was wondering who has admin privileges for http://ci.hive.apache.org/ ?

I would like to check and potentially upgrade some plugins that are
currently installed. Is it possible to get permissions to manage the
Jenkins instance?

As a side note we may want to document the current administrators and the
process to get permissions when necessary.


Best,
Stamatis


[ANNOUNCE] Apache Hive 4.0.0-beta-1 Released

2023-08-15 Thread Stamatis Zampetakis
The Apache Hive team is proud to announce the release of Apache Hive
version 4.0.0-beta-1.

The Apache Hive (TM) data warehouse software facilitates querying and
managing large datasets residing in distributed storage. Built on top
of Apache Hadoop (TM), it provides, among others:

* Tools to enable easy data extract/transform/load (ETL)

* A mechanism to impose structure on a variety of data formats

* Access to files stored either directly in Apache HDFS (TM) or in other
  data storage systems such as Apache HBase (TM)

* Query execution via Apache Tez Frameworks.

For Hive release details and downloads, please visit:
https://hive.apache.org/downloads.html

Hive 4.0.0-beta-1 Release Notes are available here:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12353351=Text=12310843

We would like to thank the many contributors who made this release
possible.

Regards,

The Apache Hive Team


Re: Pull requests - Only committers can be filtered

2023-08-14 Thread Stamatis Zampetakis
Hey Zsolt,

Filtering works [1] but possibly the autofill feature works somewhat
differently. It may be necessary to establish a list of collaborators [2]
to achieve what you want. I don't know if there is another more transparent
way to do this. Probably the best way to move this forward is file an INFRA
ticket.

Best,
Stamatis

[1] https://github.com/apache/hive/pulls/rkirtir
[2] https://lists.apache.org/thread/93lb6jhkjkmb9op9629xt6c6olwym28c

On Mon, Aug 14, 2023 at 2:44 PM Zsolt Miskolczi 
wrote:

> On GitHub at the pull requests, it is possible to filter the PRs based on
> the Author.
>
> I just noticed that only committers and the actual GitHub user is in the
> list so it is not possible to filter out pull requests made by newcomers.
>
> Is it possible to add all people with opened PR to the 'Filter by author'
> field?
>
> As an example, we have a colleague, Kirti Ruge. She has an open PR,
> https://github.com/apache/hive/pull/4497.
> And I cannot filter on her GitHub account, rkirtir.
> [image: image.png]
>
> Thank you,
> Zsolt
>


[RESULT][VOTE] Release Apache Hive 4.0.0-beta-1 (Release Candidate 0)

2023-08-10 Thread Stamatis Zampetakis
Thanks to everyone who has tested the release candidate and given
their comments and votes.

The tally is as follows.

3 binding +1s:
Stamatis Zampetakis
Denys Kuzmenko
Ayush Saxena

3 non-binding +1s:
Zhihua Deng
Sourabh Badhya
Simhadri Govindappa

No 0s or -1s.

Therefore, I am delighted to announce that the proposal to release
Apache Hive 4.0.0-beta-1 has passed.

I will proceed with the next steps of the release and I will send an
announcement once the release becomes publicly available (sometime
next week).

Best,
Stamatis


Re: Possibility of supporting TIME or other standard types

2023-08-09 Thread Stamatis Zampetakis
Hello Okumin,

As you mentioned the TIME datatype is part of the SQL standard and it
is also supported by many popular DBMS so it definitely makes sense to
add it to Hive.
I guess it was not implemented already cause users were able to store
times using other existing types so it never became a must have.

+1 on adding TIME data type support in Hive.

Best,
Stamatis

On Fri, Aug 4, 2023 at 12:19 PM Okumin  wrote:
>
> Hi everyone,
>
> I happened to find some people struggling to store values of the TIME type
> in a Hive table from another query engine. A viable option is to use
> Iceberg or another format instead of Hive native tables since Hive doesn't
> directly support the type. I agree that it could be one of the right
> options in this era.
>
> On the other hand, I also think it is a valid request to support TIME since
> it is one of the types defined in the SQL standard. I expect it is not a
> bad offer also for Hive users if they can process Iceberg's TIME as Hive's
> TIME, not STRING or other alternatives. I'd like to hear if we have any
> reasons not to support it easily.
>
> I see there is a related ticket but looks like we have not developed TIME
> yet.
> https://issues.apache.org/jira/browse/HIVE-1269
>
> Regards,
> Okumin


[VOTE] Release Apache Hive 4.0.0-beta-1 (Release Candidate 0)

2023-08-07 Thread Stamatis Zampetakis
Hi all,

I have created a build for Apache Hive 4.0.0-beta-1 Release Candidate 0.

Thanks to everyone who has contributed to this release.

You can read the release notes here:
https://github.com/apache/hive/blob/branch-4.0.0-beta-1/RELEASE_NOTES.txt

The commit to be voted upon:
https://github.com/apache/hive/commit/d2310944e412b577a39687c7968b2e93eede8433

Its hash is
d2310944e412b577a39687c7968b2e93eede8433

Tag:
https://github.com/apache/hive/tree/release-4.0.0-beta-1-rc0

The artifacts to be voted on are located here:
https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/

The hashes of the artifacts are as follows:
- 4114d8e9a523562c77237a8751dec9ed1bcbf6ccbe2e178d72f356ca4e65d466
apache-hive-4.0.0-beta-1-bin.tar.gz
- 8d157f4dcb9af5e48e51206a4046d1c11414fbc39583c84be31d609606136209
apache-hive-4.0.0-beta-1-src.tar.gz

A staged Maven repository is available for review at:
https://repository.apache.org/content/repositories/orgapachehive-1119

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/zabetak.asc
https://downloads.apache.org/hive/KEYS

Please vote on releasing this package as Apache Hive 4.0.0-beta-1.

The vote is open for the next 72 hours and passes if a majority of at
least three +1 PMC votes are cast.

[ ] +1 Release this package as Apache Hive 4.0.0-beta-1
[ ]  0 I don't feel strongly about it, but I'm okay with the release
[ ] -1 Do not release this package because...

Here is my vote:
+1 (binding)

Best,
Stamatis


Re: remove list bucking from hive 4.0

2023-08-03 Thread Stamatis Zampetakis
Hello,

As usual I would advise against sudden drop of functionality,
especially SQL syntax. It would be better if we could disable/issue a
warning in one release and drop in the next to give users at least
some time to digest the changes.

How about the following:
In 4.0, disable the functionality by default and throw an error when
used but still give the user the means to enable it back via a
property.
In 4.1, everything is gone; syntax + property + whatever else we want.

Best,
Stamatis

On Wed, Aug 2, 2023 at 3:35 PM László Bodor  wrote:
>
> LOL, this title, I meant *list bucketing, *oh god, this is history now,
> mail archives don't forget
>
> https://cwiki.apache.org/confluence/display/Hive/ListBucketing
>
>
> László Bodor  ezt írta (időpont: 2023. aug. 2.,
> Sze, 11:01):
>
> > Hey Hive devs!
> >
> > What about removing list bucketing from hive?
> > https://issues.apache.org/jira/browse/HIVE-17852
> >
> > I remember I tried this, but the patch became huge, so I'm thinking about
> > simply disabling the syntax first for 4.0 (making the user unable to create
> > a table like that), and then we can gradually remove it from the codebase.
> >
> > Why: see details in jira.
> >
> > Regards,
> > Laszlo Bodor
> >
> >


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-08-01 Thread Stamatis Zampetakis
Hello,

HIVE-27504 is now merged to master. Thanks everyone for the reviews!

I am going to prepare the release candidate for 4.0.0-beta-1 sometime this week.

Best,
Stamatis

On Thu, Jul 27, 2023 at 1:54 PM Battula, Brahma Reddy
 wrote:
>
> Looks following PR is reviewed. Any chance to get it merged and give the 
> release.?
>
> On 18/07/23, 2:39 PM, "Stamatis Zampetakis"  <mailto:zabe...@gmail.com>> wrote:
>
>
> HIVE-27504 still lacks reviews from committers.
>
>
> Note that I will not be able to work on the release from 22/07 to
> 30/07. If HIVE-27504 does not land in the next day or two the beta-1
> release might get delayed unless someone else picks up the RM role and
> cuts the RC.
>
>
> Best,
> Stamatis
>
>
> On Thu, Jul 13, 2023 at 6:33 PM Attila Turoczy
> mailto:aturo...@cloudera.com.inva>lid> wrote:
> >
> > Thanks for the update! Can't wait for the beta :)
> >
> > -Attila
> >
> > On Thu, Jul 13, 2023 at 5:19 PM Stamatis Zampetakis  > <mailto:zabe...@gmail.com>>
> > wrote:
> >
> > > Hey everyone,
> > >
> > > As you may have noticed there have been various tickets around LICENSE
> > > and NOTICE files popping up recently. I just logged HIVE-27504 [1]
> > > which hopefully addresses all remaining issues that were found while I
> > > was working with the RC. After this gets resolved we should be good to
> > > go for putting up the RC for vote.
> > >
> > > The structure and content of the LICENSE and NOTICE file are very
> > > important for Apache releases so I would encourage other members of
> > > the community (especially PMC) to review the latest changes and
> > > current status and raise new JIRA tickets if they discover some
> > > problems. I would like to avoid having last minute -1 votes due to
> > > that.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] https://issues.apache.org/jira/browse/HIVE-27504 
> > > <https://issues.apache.org/jira/browse/HIVE-27504>
> > >
> > > On Tue, Jun 20, 2023 at 11:09 PM Stamatis Zampetakis  > > <mailto:zabe...@gmail.com>>
> > > wrote:
> > > >
> > > > Hey team,
> > > >
> > > > Small heads up regarding the progress of the 4.0.0-beta-1 release.
> > > >
> > > > Most of the release steps went out smoothly and I was able to get an
> > > > RC0 ready [1].
> > > >
> > > > However, I am afraid that our binary distribution does not comply
> > > > fully with the ASF Policy [2]. We bundle a lot of dependencies (jars)
> > > > within and I am not sure if we are fully covered in terms of licenses
> > > > and notice files. Thanks Ayush for reminding me to check the
> > > > binary-package-licenses directory [5].
> > > >
> > > > I am checking various resources such as [3, 4] to see what additional
> > > > steps we can take to be on the safe side and also looking for ways to
> > > > automate this so that we don't have to manually inspect the jars on
> > > > every release. I was playing a bit with license-maven-plugin [6] but I
> > > > am not yet completely happy with its output.
> > > >
> > > > The next few days will be a bit busy so most likely I will get back on
> > > > this during the weekend. If people have feedback or other ideas to
> > > > share please let me know.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1] https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/ 
> > > > <https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/>
> > > > [2]
> > > https://www.apache.org/legal/src-headers.html#asf-source-header-and-copyright-notice-policy
> > >  
> > > <https://www.apache.org/legal/src-headers.html#asf-source-header-and-copyright-notice-policy>
> > > > [3] https://infra.apache.org/licensing-howto.html 
> > > > <https://infra.apache.org/licensing-howto.html>
> > > > [4] https://www.apache.org/legal/resolved.html 
> > > > <https://www.apache.org/legal/resolved.html>
> > > > [5] https://github.com/apache/hive/tree/master/binary-package-licenses 
> > > > <https://github.com/apache/hive/tree/master/binary-package-licenses>
> > > > [6] https://www.mojohaus.org/license-maven-plugin/ 
> > > > <https://www.mojohaus.org/license-maven-plugin/>
> > > >
> > > >
> > > > On Fri, Jun 2, 2023 at 10:03 PM Stamatis Zampetakis  > > > <mailto:zabe...@gmail.com>>
> > > wrote:
> > > > >
> > > > > I can start preparing the RC towards the end of next week. If somebody
> > > > > has more time and wants to start earlier I am fine to switch.
> > > > >
> > > > > Best,
> > > > > Stamatis
> > > > >
> > > > > On Fri, Jun 2, 2023 at 5:36 PM Denys Kuzmenko  > > > > <mailto:dkuzme...@apache.org>>
> > > wrote:
> > > > > >
> > > > > > great, this is the current list of release managers:
> > > > > >
> > > > > > 4.0.0 Stamatis Zampetakis
> > > > > > 4.1.0 Denys Kuzmenko
> > > > > > 4.2.0 Sai Hemanth Gantasala
> > > > > >
> > > > > > Should we keep the same RM order and just shift the releases or find
> > > a volunteer for the 4.0.0-beta release, WDYT?
> > > > > >
> > > > > >
> > >
>
>
>


Re: JDK 11 Support in Hive 3.x

2023-07-31 Thread Stamatis Zampetakis
Thanks for pushing this forward Aman. The main focus at this point is
4.0.0-beta-1 so personally I don't have much free time to allocate to
Hive 3.x releases.

Best,
Stamatis

On Sat, Jul 29, 2023 at 12:21 PM Aman Raj  wrote:
>
> Hi team,
>
> I have raised PR's in branch-3 https://github.com/apache/hive/pull/4495 and 
> branch-3.1 https://github.com/apache/hive/pull/4522 to provide JDK 11 support 
> for Hive 3.x releases. This was already committed in Hive 2.3.9 which 
> supports JDK11 now through this ticket - 
> https://issues.apache.org/jira/browse/HIVE-22096. So it makes sense to 
> backport the same to branch-3.x in Hive. Request someone in the community to 
> review these tickets.
>
> Thanks,
> Aman.
>


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-07-18 Thread Stamatis Zampetakis
HIVE-27504 still lacks reviews from committers.

Note that I will not be able to work on the release from 22/07 to
30/07. If HIVE-27504 does not land in the next day or two the beta-1
release might get delayed unless someone else picks up the RM role and
cuts the RC.

Best,
Stamatis

On Thu, Jul 13, 2023 at 6:33 PM Attila Turoczy
 wrote:
>
> Thanks for the update! Can't wait for the beta :)
>
> -Attila
>
> On Thu, Jul 13, 2023 at 5:19 PM Stamatis Zampetakis 
> wrote:
>
> > Hey everyone,
> >
> > As you may have noticed there have been various tickets around LICENSE
> > and NOTICE files popping up recently. I just logged HIVE-27504 [1]
> > which hopefully addresses all remaining issues that were found while I
> > was working with the RC. After this gets resolved we should be good to
> > go for putting up the RC for vote.
> >
> > The structure and content of the LICENSE and NOTICE file are very
> > important for Apache releases so I would encourage other members of
> > the community (especially PMC) to review the latest changes and
> > current status and raise new JIRA tickets if they discover some
> > problems. I would like to avoid having last minute -1 votes due to
> > that.
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-27504
> >
> > On Tue, Jun 20, 2023 at 11:09 PM Stamatis Zampetakis 
> > wrote:
> > >
> > > Hey team,
> > >
> > > Small heads up regarding the progress of the 4.0.0-beta-1 release.
> > >
> > > Most of the release steps went out smoothly and I was able to get an
> > > RC0 ready [1].
> > >
> > > However, I am afraid that our binary distribution does not comply
> > > fully with the ASF Policy [2]. We bundle a lot of dependencies (jars)
> > > within and I am not sure if we are fully covered in terms of licenses
> > > and notice files. Thanks Ayush for reminding me to check the
> > > binary-package-licenses directory [5].
> > >
> > > I am checking various resources such as [3, 4] to see what additional
> > > steps we can take to be on the safe side and also looking for ways to
> > > automate this so that we don't have to manually inspect the jars on
> > > every release. I was playing a bit with license-maven-plugin [6] but I
> > > am not yet completely happy with its output.
> > >
> > > The next few days will be a bit busy so most likely I will get back on
> > > this during the weekend. If people have feedback or other ideas to
> > > share please let me know.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/
> > > [2]
> > https://www.apache.org/legal/src-headers.html#asf-source-header-and-copyright-notice-policy
> > > [3] https://infra.apache.org/licensing-howto.html
> > > [4] https://www.apache.org/legal/resolved.html
> > > [5] https://github.com/apache/hive/tree/master/binary-package-licenses
> > > [6] https://www.mojohaus.org/license-maven-plugin/
> > >
> > >
> > > On Fri, Jun 2, 2023 at 10:03 PM Stamatis Zampetakis 
> > wrote:
> > > >
> > > > I can start preparing the RC towards the end of next week. If somebody
> > > > has more time and wants to start earlier I am fine to switch.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > On Fri, Jun 2, 2023 at 5:36 PM Denys Kuzmenko 
> > wrote:
> > > > >
> > > > > great, this is the current list of release managers:
> > > > >
> > > > > 4.0.0 Stamatis Zampetakis
> > > > > 4.1.0 Denys Kuzmenko
> > > > > 4.2.0 Sai Hemanth Gantasala
> > > > >
> > > > > Should we keep the same RM order and just shift the releases or find
> > a volunteer for the 4.0.0-beta release, WDYT?
> > > > >
> > > > >
> >


Re: [Twitter] Blog on Hive's Data Federation Capabilities

2023-07-16 Thread Stamatis Zampetakis
Interesting read, thanks Akshat for putting this together. For those
that didn't know that Hive is a powerful data federation engine the
article is a great start. Looking forward to more great content!

Best,
Stamatis

On Wed, Jul 12, 2023 at 7:54 AM Ayush Saxena  wrote:
>
> Thanx Akshat, it is up on twitter: https://twitter.com/ApacheHive
>
> -Ayush
>
> On Wed, 12 Jul 2023 at 10:08, Akshat m  wrote:
> >
> > Delve into the world of data federation with Apache Hive in this blog. 
> > Explore
> > the essence of data federation, uncover Hive's capabilities, and learn
> > about its supported integrations. Don't miss out on the insights:
> > https://akshatmat.medium.com/data-federation-with-apache-hive-74b3bc5fb72
> > #DataFederation #ApacheHive #DataAnalytics
> >
> > Regards,
> > Akshat Mathur


Re: Hive 2.3.10 release?

2023-07-16 Thread Stamatis Zampetakis
Hello,

I don't know in what state is branch-2.3 right now but if someone is
willing to cut a release from there then sure why not. Just ensure
that LICENSE/NOTICE files conform to the ASF policy and the respective
JIRAs are backported. Probably some of the following JIRAs are
relevant (hoping that I didn't miss anything else):
* https://issues.apache.org/jira/browse/HIVE-25665
* https://issues.apache.org/jira/browse/HIVE-27466
* https://issues.apache.org/jira/browse/HIVE-27467
* https://issues.apache.org/jira/browse/HIVE-27468
* https://issues.apache.org/jira/browse/HIVE-27478
* https://issues.apache.org/jira/browse/HIVE-27504

Best,
Stamatis

On Fri, Jul 14, 2023 at 8:49 AM Cheng Pan  wrote:
>
> +1
>
> Please consider including Thrift upgrading for security purpose.
>
> On 2023/07/12 04:09:19 Chao Sun wrote:
> > Hi all,
> >
> > It's been quite a while since the last 2.3.9 release, and there are
> > several commits accumulated in the branch-2.3, including a few
> > critical bug fixes. Since Hive 2.3.x is still actively being used by
> > projects such as Apache Spark, I'm thinking about initiating a new
> > release process, if there's no objections. Please let me know your
> > thoughts. Thanks
> >
> > Best,
> > Chao
> >


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-07-13 Thread Stamatis Zampetakis
Hey everyone,

As you may have noticed there have been various tickets around LICENSE
and NOTICE files popping up recently. I just logged HIVE-27504 [1]
which hopefully addresses all remaining issues that were found while I
was working with the RC. After this gets resolved we should be good to
go for putting up the RC for vote.

The structure and content of the LICENSE and NOTICE file are very
important for Apache releases so I would encourage other members of
the community (especially PMC) to review the latest changes and
current status and raise new JIRA tickets if they discover some
problems. I would like to avoid having last minute -1 votes due to
that.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-27504

On Tue, Jun 20, 2023 at 11:09 PM Stamatis Zampetakis  wrote:
>
> Hey team,
>
> Small heads up regarding the progress of the 4.0.0-beta-1 release.
>
> Most of the release steps went out smoothly and I was able to get an
> RC0 ready [1].
>
> However, I am afraid that our binary distribution does not comply
> fully with the ASF Policy [2]. We bundle a lot of dependencies (jars)
> within and I am not sure if we are fully covered in terms of licenses
> and notice files. Thanks Ayush for reminding me to check the
> binary-package-licenses directory [5].
>
> I am checking various resources such as [3, 4] to see what additional
> steps we can take to be on the safe side and also looking for ways to
> automate this so that we don't have to manually inspect the jars on
> every release. I was playing a bit with license-maven-plugin [6] but I
> am not yet completely happy with its output.
>
> The next few days will be a bit busy so most likely I will get back on
> this during the weekend. If people have feedback or other ideas to
> share please let me know.
>
> Best,
> Stamatis
>
> [1] https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/
> [2] 
> https://www.apache.org/legal/src-headers.html#asf-source-header-and-copyright-notice-policy
> [3] https://infra.apache.org/licensing-howto.html
> [4] https://www.apache.org/legal/resolved.html
> [5] https://github.com/apache/hive/tree/master/binary-package-licenses
> [6] https://www.mojohaus.org/license-maven-plugin/
>
>
> On Fri, Jun 2, 2023 at 10:03 PM Stamatis Zampetakis  wrote:
> >
> > I can start preparing the RC towards the end of next week. If somebody
> > has more time and wants to start earlier I am fine to switch.
> >
> > Best,
> > Stamatis
> >
> > On Fri, Jun 2, 2023 at 5:36 PM Denys Kuzmenko  wrote:
> > >
> > > great, this is the current list of release managers:
> > >
> > > 4.0.0 Stamatis Zampetakis
> > > 4.1.0 Denys Kuzmenko
> > > 4.2.0 Sai Hemanth Gantasala
> > >
> > > Should we keep the same RM order and just shift the releases or find a 
> > > volunteer for the 4.0.0-beta release, WDYT?
> > >
> > >


[DISCUSS] HPL/SQL website migration

2023-07-12 Thread Stamatis Zampetakis
Hey everyone,

HPL/SQL has been part of Hive for a long time now. Surprisingly (at
least for me), the main website and documentation around HPL/SQL[1] is
not under the hive domain and I don't know how many people have access
to it. Looking at the website [1] at the moment some things appear
broken but I have no clue where are the sources to possibly fix those.

Since HPL/SQL is part of Hive, I think we should migrate the content
of the website [1] under our official Hive web page [2]. Like that it
can benefit from the new look and feel of the Hive website [2] and it
will facilitate contributions from the Hive community.

I assume that the current website [1] is maintained by Dmitry Tolpeko
so big thanks for keeping this up and running for all these years.
Hopefully, the migration will remove some maintenance burden from
their shoulders.

What do you think?

Best,
Stamatis

[1] http://www.hplsql.org/
[2] https://hive.apache.org/


Re: Is there a way to test schema evolution scripts?

2023-07-10 Thread Stamatis Zampetakis
Hello,

The tests under DbInstallBase [1] seem to do more or less what you
want. Check them out and if something is missing don't hesitate to
enrich them.

Best,
Stamatis

[1] 
https://github.com/apache/hive/blob/5e46e80bc7d059093aece81e3886ba5ee425ee95/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/DbInstallBase.java

On Mon, Jul 10, 2023 at 5:46 PM Ayush Saxena  wrote:
>
> Hi Zsolt,
> You can start by exploring this:
> https://github.com/apache/hive/blob/master/standalone-metastore/DEV-README
>
> It has steps to run those metatore scripts over different database
> types locally.
>
> -Ayush
>
> On Mon, 10 Jul 2023 at 20:30, Zsolt Miskolczi  
> wrote:
> >
> > Hi,
> >
> > I just stared at a pull request about using liquibase for schema evolution
> > and I was thinking how such a change can be validated.
> >
> > I think the main issue is that we support multiple types of relational
> > databases but precommit tests don't test them at all.
> >
> > I think if we want to check if the idea works, it is enough to:
> > - introduce a proper docker based instance for each type of database
> > (postgres and derby first).
> > - run the upgrade scripts on them
> > - check if any error occurs
> >
> > And the end goal should be something like having validation scripts and
> > check if we are able to run selects after the upgrades and fresh installs.
> >
> >
> > What do you think about that?
> >
> > Thank you,
> > Zsolt Miskolczi


Virtual key signing party

2023-06-28 Thread Stamatis Zampetakis
Hi all,

The Calcite community is organising a key signing party tomorrow [1].
Such events are extremely useful for growing the web of trust and are
particularly relevant for ASF committers which usually have to sign
artifacts and releases.

The event will not take more than a couple of minutes. If you want to
participate please reply to the original thread and share your
fingerprint.

Best,
Stamatis

[1] https://lists.apache.org/thread/tbhklo0j4q4t7sox94bmn48y90v7nrkk


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-06-20 Thread Stamatis Zampetakis
Hey team,

Small heads up regarding the progress of the 4.0.0-beta-1 release.

Most of the release steps went out smoothly and I was able to get an
RC0 ready [1].

However, I am afraid that our binary distribution does not comply
fully with the ASF Policy [2]. We bundle a lot of dependencies (jars)
within and I am not sure if we are fully covered in terms of licenses
and notice files. Thanks Ayush for reminding me to check the
binary-package-licenses directory [5].

I am checking various resources such as [3, 4] to see what additional
steps we can take to be on the safe side and also looking for ways to
automate this so that we don't have to manually inspect the jars on
every release. I was playing a bit with license-maven-plugin [6] but I
am not yet completely happy with its output.

The next few days will be a bit busy so most likely I will get back on
this during the weekend. If people have feedback or other ideas to
share please let me know.

Best,
Stamatis

[1] https://people.apache.org/~zabetak/apache-hive-4.0.0-beta-1-rc0/
[2] 
https://www.apache.org/legal/src-headers.html#asf-source-header-and-copyright-notice-policy
[3] https://infra.apache.org/licensing-howto.html
[4] https://www.apache.org/legal/resolved.html
[5] https://github.com/apache/hive/tree/master/binary-package-licenses
[6] https://www.mojohaus.org/license-maven-plugin/


On Fri, Jun 2, 2023 at 10:03 PM Stamatis Zampetakis  wrote:
>
> I can start preparing the RC towards the end of next week. If somebody
> has more time and wants to start earlier I am fine to switch.
>
> Best,
> Stamatis
>
> On Fri, Jun 2, 2023 at 5:36 PM Denys Kuzmenko  wrote:
> >
> > great, this is the current list of release managers:
> >
> > 4.0.0 Stamatis Zampetakis
> > 4.1.0 Denys Kuzmenko
> > 4.2.0 Sai Hemanth Gantasala
> >
> > Should we keep the same RM order and just shift the releases or find a 
> > volunteer for the 4.0.0-beta release, WDYT?
> >
> >


Re: [DISCUSS] Automatic rerunning of failed tests in Hive Pre-commit

2023-06-12 Thread Stamatis Zampetakis
Hello,

I tend to agree with Sai; if we can run on demand the failed tests
that would be a cool feature. If we just rerun everything that fails
without questions asked we may do more harm than good. Taking into
account the fact that nobody runs all tests locally before submitting
a PR (cause it is not feasible) there is a very high chance that
things go really bad in CI and rerunning will just keep wasting
resources.

Best,
Stamatis

On Thu, Jun 8, 2023 at 7:56 PM Sai Hemanth Gantasala
 wrote:
>
> Hello everyone,
>
> My personal preference is that the option to rerun *only* the failed test
> suites should be manual. The reason is, there might be frequent intentional
> test failures (failures due to our own patch) and the tests being flaky is
> less likely. So giving the option to the user to rerun the failed test
> suites wastes fewer resources if the failures are not flaky.
> If we were to automatically retry running failed test suites, then I prefer
> the value of "rerunFailingTestsCount" to be set to 1, since there is very
> less probability that the tests will be flaky in the consecutive runs.
>
> Thanks,
> Sai.
>
> On Thu, Jun 8, 2023 at 12:07 AM Ayush Saxena  wrote:
>
> > +1 from me as well to rerun the failing tests.
> > The oracle docker is also a pain, it is one of the main reasons for
> > retriggers, these retriggers wastes a lot of resources and increases the
> > time to get build results for genuine runs.
> >
> > -Ayush
> >
> > On Thu, 8 Jun 2023 at 12:31, Butao Zhang  wrote:
> >
> > > +1. I often have to rerun whole pre-commit job due to individual unstable
> > > test, ant it is too time-consuming. It would be much better if we can set
> > > maven config to retry automatically.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Butao Zhang
> > >
> > >  Replied Message 
> > > | From | r12 t45 |
> > > | Date | 6/8/2023 14:52 |
> > > | To |  |
> > > | Subject | [DISCUSS] Automatic rerunning of failed tests in Hive
> > > Pre-commit |
> > > Hi All,
> > >
> > > It often happens that Hive unit tests fail during pre-commit which
> > requires
> > > rerunning the whole pre-commit job and creates hours of delays.
> > > What if we set Maven config to retry failed tests automatically X times?
> > > There is "rerunFailingTestsCount" property in maven-surefire-plugin which
> > > can be used for that.
> > > I would like to hear the feedback and if it is positive I could open a
> > JIRA
> > > ticket and work on it.
> > >
> > > Thanks,
> > > Dmitriy
> > >
> >


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-06-02 Thread Stamatis Zampetakis
I can start preparing the RC towards the end of next week. If somebody
has more time and wants to start earlier I am fine to switch.

Best,
Stamatis

On Fri, Jun 2, 2023 at 5:36 PM Denys Kuzmenko  wrote:
>
> great, this is the current list of release managers:
>
> 4.0.0 Stamatis Zampetakis
> 4.1.0 Denys Kuzmenko
> 4.2.0 Sai Hemanth Gantasala
>
> Should we keep the same RM order and just shift the releases or find a 
> volunteer for the 4.0.0-beta release, WDYT?
>
>


Re: Re: Reg: Discussion on removal of deprecated APIs in the HMS thrift interface

2023-06-01 Thread Stamatis Zampetakis
Zhihua brought up a good point. Yes if it was introduced in
4.0.0-alpha and then was deprecated we can remove it.

On Thu, Jun 1, 2023 at 1:00 PM Attila Turoczy
 wrote:
>
> +1 from me as well. Let's clean it up. Still, because we have struggled
> with the data correctness issue, we have time to introduce these changes.
> If won't fit then won't be a problem as well, as the next release will
> contain it. As I wrote earlier, as the 4.0 goes out I want to help to have
> regular releases. Even majors. I have started a proposal document about a
> public hive roadmap, and release roadmap that I want to share and discuss
> with the community.
>
> -Attila
>
> On Thu, Jun 1, 2023 at 12:37 PM dengzhhu653  wrote:
>
> > Hi
> >
> >
> > Thanks Sai for driving this, the request based API makes sense to me.
> > For the removal of deprecated API:
> >  a) +1 if it is marked as deprecated in 3.x;
> >  b) If the API is introduced after 4.0.0-alpha, but tend to become
> > obsolete in 4.x GA, I think we can remove it as well.
> >
> >
> > Thanks,
> > Zhihua.
> > At 2023-06-01 17:56:03, "Ayush Saxena"  wrote:
> > >+1 to what Stamatis said, if it is there in 3.X we can explore their
> > removal, else let them go in 4.x GA release and we can remove then in the
> > subsequent release
> > >
> > >-Ayush
> > >
> > >> On 01-Jun-2023, at 3:08 PM, Stamatis Zampetakis 
> > wrote:
> > >>
> > >> Hello,
> > >>
> > >> Ideally we should deprecate APIs in one release and remove them in a
> > >> subsequent major release. If the HMS deprecations were added in Hive
> > >> 3.X then I am ok removing them now. Otherwise it is not really that we
> > >> will remove deprecated APIs but we will remove regular APIs without
> > >> any notice.
> > >>
> > >> Best,
> > >> Stamatis
> > >>
> > >>> On Thu, Jun 1, 2023 at 2:57 AM Sai Hemanth Gantasala
> > >>>  wrote:
> > >>>
> > >>> Hi everyone,
> > >>>
> > >>> This thread is to initiate a discussion on the removal of deprecated
> > APIs
> > >>> in the HMS thrift class. Any client including HiveMetastoreClient
> > talks to
> > >>> HiveMetaStore Server via the thrift layer. Over the past few years, the
> > >>> thrift class is bloated with duplicated APIs with varying parameters
> > >>> (function overloading) in the API definition. The reason why the APIs
> > are
> > >>> being deprecated is that the API might need an additional argument, so
> > a
> > >>> new API is added with an additional argument, and mark the old API as
> > >>> deprecated.
> > >>>
> > >>> I'm working on HIVE-26537 <
> > https://issues.apache.org/jira/browse/HIVE-26537>
> > >>> to clean up the code around the interaction between
> > HiveMetaStoreClient and
> > >>> HMS to not use the deprecated APIs (the HMS client will now be using
> > >>> request-based APIs instead of APIs using individual arguments). Going
> > >>> forward, using these request-based APIs is ideal as we can just add an
> > >>> additional field to request object definition in the thrift class and
> > API
> > >>> remains unchanged. This would hopefully require minimal changes between
> > >>> client and server interaction in the future.
> > >>>
> > >>> I would like to hear the community member's opinions regarding the
> > >>> deprecated APIs,
> > >>> 1) Keep the deprecated APIs for the 4.x release, HMSClient will use the
> > >>> request-based APIs, So that would keep the older clients compatible
> > with
> > >>> the new HMS server.
> > >>> 2) Remove the deprecated APIs for the 4.x release. This would break
> > >>> backward compatibility with the older clients but we have the
> > opportunity
> > >>> to clean up a lot of deprecated code. Since we are making a major
> > release
> > >>> after 5 years, I hope this incompatibility is acceptable.
> > >>>
> > >>> Please let me know your thoughts.
> > >>>
> > >>> Thanks,
> > >>> Sai.
> >


Re: Reg: Discussion on removal of deprecated APIs in the HMS thrift interface

2023-06-01 Thread Stamatis Zampetakis
Hello,

Ideally we should deprecate APIs in one release and remove them in a
subsequent major release. If the HMS deprecations were added in Hive
3.X then I am ok removing them now. Otherwise it is not really that we
will remove deprecated APIs but we will remove regular APIs without
any notice.

Best,
Stamatis

On Thu, Jun 1, 2023 at 2:57 AM Sai Hemanth Gantasala
 wrote:
>
> Hi everyone,
>
> This thread is to initiate a discussion on the removal of deprecated APIs
> in the HMS thrift class. Any client including HiveMetastoreClient talks to
> HiveMetaStore Server via the thrift layer. Over the past few years, the
> thrift class is bloated with duplicated APIs with varying parameters
> (function overloading) in the API definition. The reason why the APIs are
> being deprecated is that the API might need an additional argument, so a
> new API is added with an additional argument, and mark the old API as
> deprecated.
>
> I'm working on HIVE-26537 
> to clean up the code around the interaction between HiveMetaStoreClient and
> HMS to not use the deprecated APIs (the HMS client will now be using
> request-based APIs instead of APIs using individual arguments). Going
> forward, using these request-based APIs is ideal as we can just add an
> additional field to request object definition in the thrift class and API
> remains unchanged. This would hopefully require minimal changes between
> client and server interaction in the future.
>
> I would like to hear the community member's opinions regarding the
> deprecated APIs,
> 1) Keep the deprecated APIs for the 4.x release, HMSClient will use the
> request-based APIs, So that would keep the older clients compatible with
> the new HMS server.
> 2) Remove the deprecated APIs for the 4.x release. This would break
> backward compatibility with the older clients but we have the opportunity
> to clean up a lot of deprecated code. Since we are making a major release
> after 5 years, I hope this incompatibility is acceptable.
>
> Please let me know your thoughts.
>
> Thanks,
> Sai.


Re: [DISCUSS] HIVE 4.0.0 GA Release Proposal

2023-06-01 Thread Stamatis Zampetakis
+1 from me as well. Any alpha or beta name should be fine I have no
strong preferences.

On Wed, May 31, 2023, 2:30 PM László Bodor 
wrote:

> Hi!
>
> +1 for creating a new release before GA in the presence of possible
> correctness problems. I'm not 100% sure about alpha or beta, I'm fine with
> alpha-3.
>
> Regards,
> Laszlo Bodor
>
> Denys Kuzmenko  ezt írta (időpont: 2023. máj. 31.,
> Sze, 14:22):
>
> > Hi folks,
> >
> > The master branch has many new features, bug fixes, and performance
> > improvements since alpha-2. However, we still have several correctness
> bugs
> > [HIVE-26654] and performance issues that should be eliminated before the
> > GA.
> >
> > Could we consider doing a beta release to keep at least a 6-month release
> > cadence and also show the community that 4.0.0 GA is the next stop?
> >
> > Thanks,
> > Denys
> >
>


Re: Move to JDK-11

2023-06-01 Thread Stamatis Zampetakis
Hey everyone,

If we claim that Hive supports a certain JDK then we should compile and run
tests with it.

The more JDKs we can support the better for everyone but this comes at a
cost (resources mostly). We should have a precommit run for every supported
JDK (frequency to be determined once per day/week) that compiles and run
all tests.

>From my perspective, I would be pretty happy if we could cover the two edge
LTS releases at every point in time.

Then we have to decide also which JDK shall we use for the pull requests
and local dev environment. I think it makes sense to use the latest. People
like working on modern stuff and also it makes sense that newer releases
will also use newer versions. It would be pretty awkward if someone wants
to use the latest Hive version and it turns out that it can only run on
JDK8.

Best,
Stamatis

On Thu, Jun 1, 2023, 3:42 AM Sungwoo Park  wrote:

> Hi, everyone.
>
> I have not tested the master branch with Java 11/17 yet, but I would like
> to share my experience with testing a fork of branch-3.1 with Java 11/17
> (as part of developing Hive-MR3), in case that it can be useful for the
> discussion. I merged the patches listed in [1] HIVE-22415 and updated the
> Maven configuration for Java 11.
>
> 1. Building Hive was fine and I was able to run it with Java 11 as well as
> Java 17. So, it seems that the work reported in [1] is indeed complete for
> upgrading to Java 11 (and Java 17) and getting Hive to work.
>
> 2. However, there was a problem with running tests, so this can be
> additional work for upgrading to Java 11.
>
> 3. For performance, Java 17 gives about 8 percent of (free) performance
> improvement. When tested with 10TB TPC-DS, Java 8 takes 8074 seconds,
> whereas Java 17 takes 7415 seconds. Considering the maturity of Hive, I
> think this is not a small improvement because almost every query gets some
> speedup.
>
> Thanks,
>
> --- Sungwoo
>
> [1] https://issues.apache.org/jira/browse/HIVE-22415
>
>
> On Thu, Jun 1, 2023 at 3:53 AM Sai Hemanth Gantasala
>  wrote:
>
> > Hi All,
> >
> > I would strongly advocate keeping support for JDK8.
> > Between JDK11 and JDK17, Depending on the amount of effort on the upgrade
> > I'm inclined towards JDK17 (JDK21 LTS will be released in Sep 2023).
> >
> > Thanks,
> > Sai.
> >
> > On Wed, May 31, 2023 at 5:39 AM László Bodor 
> > wrote:
> >
> > > *Hi!*
> > >
> > >
> > > *Should we support both JDK-11 & JDK-8?*
> > > IMO absolutely yes, let's not break up with JDK-8: according to its
> > > lifecycle, it's going to stay with us for a long time.
> > >
> > > I believe
> > > a) we should be able to compile on JDK8, JDK11, and JDK17 (github
> actions
> > > can cover this conveniently in precommit time, like tez
> > >  >)
> > > b) the release artifacts should be compatible with JDK8 as long as it
> is
> > > with us.
> > >
> > > Regards,
> > > Laszlo Bodor
> > >
> > >
> > > Butao Zhang  ezt írta (időpont: 2023. máj. 31.,
> > Sze,
> > > 14:33):
> > >
> > > > Thanks Ayush for driving this! Good to know that Hive is getting
> ready
> > > for
> > > > newer JDK.
> > > > From my opinon, if we have more community energy to put into it, we
> can
> > > > support both JDK-11 and JDK-17 like Spark[1]. If we have to  make a
> > > choice
> > > > between a JDK-11 and JDK-17, i would like to choose the relatively
> new
> > > > version JDK-17, meanwhile, we should maintain compatibility with
> jdk8,
> > as
> > > > JDK-8 is still widely used in most big data platforms.
> > > >
> > > >
> > > > Thanks,
> > > > Butao Zhang
> > > >
> > > >
> > > > [1]https://issues.apache.org/jira/browse/SPARK-33772
> > > >  Replied Message 
> > > > | From | Ayush Saxena |
> > > > | Date | 5/31/2023 18:39 |
> > > > | To | dev |
> > > > | Subject | Move to JDK-11 |
> > > > Hi Everyone,
> > > > Want to pull in the attention of folks towards moving to JDK-11
> compile
> > > > time support in Hive. There was a ticket in the past [1] which talks
> > > about
> > > > it and If I could decode it right, it was blocked because the Hadoop
> > > > version used by Hive didn't had JDK-11 runtime support, But with [2]
> in
> > > we
> > > > have upgraded the Hadoop version, so that problem is sorted out. I
> > > couldn't
> > > > even see any unresolved tickets in the blocked state either.
> > > >
> > > > I quickly tried* a  mvn clean install -DskipTests -Piceberg -Pitests
> > > > -Dmaven.javadoc.skip=true
> > > >
> > > > And no surprises it failed with some weird exceptions towards the
> end.
> > > But
> > > > I think that should be solvable.
> > > >
> > > > So, Questions?
> > > >
> > > > - What do folks think about this? Should we put in some effort
> towards
> > > > JDK-11
> > > > - Should we support both JDK-11 & JDK-8?
> > > > - Ditch JDK-11 and directly shoot for JDK-17?
> > > >
> > > > Let me know your thoughts, In case anyone has some experience in this
> > > area
> > > > and have tried 

Re: Apache Hive on Twitter

2023-05-24 Thread Stamatis Zampetakis
Thanks for driving this Ayush! It's great to see Hive alive again on twitter.

Best,
Stamatis

On Tue, May 23, 2023 at 3:58 AM Ayush Saxena  wrote:
>
> Hi All,
> I am happy to announce: We have got the Apache Hive Twitter account active
> again or maybe in other words we have got creds to use it now.
>
> The twitter account stays here:
>
> https://twitter.com/ApacheHive
>
> The account belongs to all of us at Hive. As we decided, if anyone wants to
> get anything posted on the Twitter account, related to Apache Hive. He/She
> can drop a mail to the Hive Dev mailing with the request, with a label in
> the subject [Twitter].
>
> For the record as of today, following people have access to post:
>
> Alan Gates, Ayush Saxena, Carl Steinbach, Joydeep Sen Sharma, Owen
> O'Malley, Sushanth Sowmyan, Szehon Ho, Thejas Nair & Vikram Dixit
>
> A note of thanks to Joydeep Sen Sharma, Carl Steinbach, Stamatis Zampetakis
> & Naveen Gangam for helping with the process. Attila Turoczy for the
> initial thoughts/idea.
>
> -Ayush


Re: [DISCUSS] Nightly snaphot builds

2023-05-24 Thread Stamatis Zampetakis
Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar  wrote:
>
> I think there are many benefits like others in this thread suggested which
> can be built on top of nightly builds. Having docker images is great but
> for now I think we can start simple and publish the jars. Many users still
> just deploy using jars and it would be useful to them. Once we have a
> docker environment we can add a docker image too to the nightly builds so
> that users can choose their preferred way.
>
> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park  wrote:
>
> > I think such nightly builds will be useful for testing and debugging in the
> > future.
> >
> > I also wonder if we can somehow create builds even from previous commits
> > (e.g., for the past few years). Such builds from previous commits don't
> > have to be daily builds, and I think weekly builds (or even monthly builds)
> > would also be very useful.
> >
> > The reason I wish such builds were available is to facilitate debugging and
> > testing. When tested against the TPC-DS benchmark, the current master
> > branch has several correctness problems that were introduced after the
> > release of Hive 3.1.2. We have reported all problems known to us in [1] and
> > also submitted several patches. If such nightly builds had been available,
> > we would have saved quite a bit of time for implementing the patches by
> > quickly finding offending commits that introduced new correctness bugs.
> >
> > In addition, you can find quite a few commits in the master branch that
> > report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
> > HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> > HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> > HIVE-25170, HIVE-25864, HIVE-26671.
> > (There may be some errors in this list because we compared against Hive
> > 3.1.2 with many patches backported.) Such nightly builds can be useful for
> > finding root causes of such bugs.
> >
> > Ideally I wish there was an automated procedure to create nightly builds,
> > run TPC-DS benchmark, and report correctness/performance results, although
> > this would be quite hard to implement. (I remember Spark implemented this
> > procedure in the era of Spark 2, but my memory could be wrong.)
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26654
> >
> >
> > On Tue, May 23, 2023 at 10:44 AM Ayush Saxena  wrote:
> >
> > > Hi Vihang,
> > > +1, We were even exploring publishing the docker images of the snapshot
> > > version as well per commit or maybe weekly, so just shoot 2 docker
> > commands
> > > and you get a Hive cluster running with master code.
> > >
> > > Sai, I think to spin up an env via Docker with all these things should be
> > > doable for sure, but would require someone with real good expertise with
> > > docker as well as setting up these services with Hive. Obviously, I am
> > not
> > > that guy :-)
> > >
> > > @Simhadri has a PR which publishes docker images once a release tag is
> > > pushed, you can explore to have similar stuff for the Snapshot version,
> > > maybe if that sounds cool
> > >
> > > -Ayush
> > >
> > > On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> > >  wrote:
> > >
> > > > Hi Vihang,
> > > >
> > > > +1 on the idea.
> > > >
> > > > This is a great idea to quickly test if a certain feature is working as
> > > > expected on a certain branch.
> > > > This way we test data loss, correctness, or any other unexpected
> > > scenarios
> > > > that are Hive specific only. However, I'm wondering if it is possible
> > to
> > > > deploy/test in a kerberized environment or issues involving
> > authorization
> > > > services like sentry/ranger.
> > > >
> > > > Thanks,
> > > > Sai.
> > > >
> > > > On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> > > vihan...@apache.org>
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > I have observed that it is a common use-case where users would like
> > to
> > > > test
> > > > > out unreleased features/bug fixes either to unblock them or test out
> > if
> > > > the
> > > > > bug fixes really work as intended in their environments. Today in the
> > > > case
> > > > > of Apache Hive, this is not very user friendly because it requires
> > the
> > > > end
> > > > > user to build the binaries directly from the hive source code.
> > > > >
> > > > > I found that Apache Spark has a very useful infrastructure [1] which
> > > > > deploys nightly snapshots [2] [3] from the branch using github
> > actions.
> > > > > This is super useful for any user who wants to try out the latest and
> > > > > greatest using the nightly builds.
> > > > >
> > > > > I was wondering if we should also adopt this. We can use github
> > actions
> > > > to
> > > > > upload the snapshot jars to the public repository (e.g github
> > packages)
> > > > and

Re: Updating the Hive Committer Guide Wiki

2023-05-19 Thread Stamatis Zampetakis
Thanks for updating the wiki Ayush! Definitely very helpful and
hopefully we can do it for other pages as well.

Slack is a very useful tool but personally I don't have much time to
monitor yet another channel of communication. I don't know if we
should encourage people to start discussions there especially since
access is moderated and search archives are not openly available. I
would prefer to direct people to dev@ or user@ and not slack but this
is just my personal opinion.

Best,
Stamatis

On Fri, May 19, 2023 at 6:27 AM Ayush Saxena  wrote:
>
> Hi All,
> I recently observed that our Hive Committer guide is pretty outdated
> and has mentioned legacy ways of committing, but still has a lot of
> relevant information.
>
> After discussing with some friends offline, I have updated the doc.
> Feel free to share feedback or improvements.
>
> Committers to the projects already have access to the wiki, so they
> can directly update it, If anyone else has any feedback, feel free to
> share and someone amongst the committer group would be happy to get
> things updated.
>
> The Wiki page lies here:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27362108
>
> -Ayush


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-05-16 Thread Stamatis Zampetakis
The umbrella ticket is HIVE-26654 [1]. I am currently looking into
HIVE-26968 and probably gonna merge this in the following days.

Help in reviewing or fixing the remaining tickets would be much appreciated.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26654

On Tue, May 16, 2023 at 11:52 AM Attila Turoczy
 wrote:
>
> +2. Who is working now on the TPCDS regression? Can I / We help him/ her?
>
> -Attila
>
>
> On Tue, May 16, 2023 at 11:04 AM Stamatis Zampetakis 
> wrote:
>
> > I agree with Attila we should do our best to come out with the next GA
> > soon. In order to do that we should treat the TPCDS regressions that are
> > already reported. It doesn't make much sense to give out a GA that cannot
> > run the whole TPCDS suite without crashing or returning wrong results.
> >
> > If solving all the problems in a reasonable timeframe is not possible then
> > I would suggest to cut another alpha or beta release.
> >
> > Best,
> > Stamatis
> >
> > On Fri, May 12, 2023, 6:36 PM Attila Turoczy  > >
> > wrote:
> >
> > > Could we please give some attention to this topic? I strongly believe
> > that
> > > we should put in every effort to release Hive 4. The Hive community needs
> > > to demonstrate that we are active and accomplishing exciting
> > developments.
> > > It is quite disheartening to note that our last major GA release was a
> > > staggering 5 years ago on 18th May 2018! The significance of version 4.0
> > > cannot be overstated, and we should definitely prioritize its promotion.
> > >
> > > [image: image.png]
> > >
> > > -Attila
> > >
> > > On Tue, May 9, 2023 at 8:23 PM Kirti Ruge  wrote:
> > >
> > >> I see a few tickets like HIVE-26400 which is a major milestone, are
> > >> resolved .
> > >> Can we reevaluate priorities of other JIRAs so that It may give us
> > clarity
> > >> GO/NO-GO  for 4.0.0 GA release  and its timeline?
> > >>
> > >>
> > >>
> > >> Thanks,
> > >> Kirti
> > >>
> > >> On Sat, Mar 25, 2023 at 3:27 PM Stamatis Zampetakis 
> > >> wrote:
> > >>
> > >> > Regarding correctness, I think it makes sense to change default values
> > >> and
> > >> > possibly add a warning note when there's a known risk of wrong
> > results.
> > >> > Needless to say that we should try to fix as many issues as possible;
> > we
> > >> > still need volunteers to review open PRS.
> > >> >
> > >> > Performances regressions are trickier but if we have the query plans
> > >> (CBO +
> > >> > full) along with logs (including task counters) for fast and slow
> > >> execution
> > >> > we may be able to understand what happens. Don't hesitate to create
> > Jira
> > >> > tickets with these information if available.
> > >> >
> > >> > Last regarding 4.0.0 blockers, I don't think we need a special label.
> > >> The
> > >> > built-in and widely used priority "blocker" seems enough to capture
> > the
> > >> > importance and urgency of a ticket.
> > >> > Since I am the release manager for the next release I will go over
> > >> tickets
> > >> > marked as blockers and reevaluate priorities if necessary.
> > >> >
> > >> > Best,
> > >> > Stamatis
> > >> >
> > >> > On Thu, Mar 23, 2023, 10:27 AM Denys Kuzmenko 
> > >> > wrote:
> > >> >
> > >> > > Thanks, Sungwoo for running the TPC-DS benchmark. Do we know if the
> > >> same
> > >> > > level of performance degradation was present in 4.0.0-alpha1?
> > >> > >
> > >> > > All: please use the `hive-4.0.0-must` label in a ticket if you think
> > >> it's
> > >> > > a show-stopper for the release.
> > >> > >
> > >> >
> > >>
> > >
> >


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-05-16 Thread Stamatis Zampetakis
I agree with Attila we should do our best to come out with the next GA
soon. In order to do that we should treat the TPCDS regressions that are
already reported. It doesn't make much sense to give out a GA that cannot
run the whole TPCDS suite without crashing or returning wrong results.

If solving all the problems in a reasonable timeframe is not possible then
I would suggest to cut another alpha or beta release.

Best,
Stamatis

On Fri, May 12, 2023, 6:36 PM Attila Turoczy 
wrote:

> Could we please give some attention to this topic? I strongly believe that
> we should put in every effort to release Hive 4. The Hive community needs
> to demonstrate that we are active and accomplishing exciting developments.
> It is quite disheartening to note that our last major GA release was a
> staggering 5 years ago on 18th May 2018! The significance of version 4.0
> cannot be overstated, and we should definitely prioritize its promotion.
>
> [image: image.png]
>
> -Attila
>
> On Tue, May 9, 2023 at 8:23 PM Kirti Ruge  wrote:
>
>> I see a few tickets like HIVE-26400 which is a major milestone, are
>> resolved .
>> Can we reevaluate priorities of other JIRAs so that It may give us clarity
>> GO/NO-GO  for 4.0.0 GA release  and its timeline?
>>
>>
>>
>> Thanks,
>> Kirti
>>
>> On Sat, Mar 25, 2023 at 3:27 PM Stamatis Zampetakis 
>> wrote:
>>
>> > Regarding correctness, I think it makes sense to change default values
>> and
>> > possibly add a warning note when there's a known risk of wrong results.
>> > Needless to say that we should try to fix as many issues as possible; we
>> > still need volunteers to review open PRS.
>> >
>> > Performances regressions are trickier but if we have the query plans
>> (CBO +
>> > full) along with logs (including task counters) for fast and slow
>> execution
>> > we may be able to understand what happens. Don't hesitate to create Jira
>> > tickets with these information if available.
>> >
>> > Last regarding 4.0.0 blockers, I don't think we need a special label.
>> The
>> > built-in and widely used priority "blocker" seems enough to capture the
>> > importance and urgency of a ticket.
>> > Since I am the release manager for the next release I will go over
>> tickets
>> > marked as blockers and reevaluate priorities if necessary.
>> >
>> > Best,
>> > Stamatis
>> >
>> > On Thu, Mar 23, 2023, 10:27 AM Denys Kuzmenko 
>> > wrote:
>> >
>> > > Thanks, Sungwoo for running the TPC-DS benchmark. Do we know if the
>> same
>> > > level of performance degradation was present in 4.0.0-alpha1?
>> > >
>> > > All: please use the `hive-4.0.0-must` label in a ticket if you think
>> it's
>> > > a show-stopper for the release.
>> > >
>> >
>>
>


Re: [DISCUSS] Disable JIRA worklog for GitHub PRs

2023-05-15 Thread Stamatis Zampetakis
Thanks everyone for the feedback!

Merged under https://issues.apache.org/jira/browse/HIVE-27341

Best,
Stamatis

On Fri, May 12, 2023 at 9:43 PM Ayush Saxena  wrote:
>
> +1
>
> Thanx Stamatis, makes sense
>
> -Ayush
>
> > On 12-May-2023, at 10:14 PM, Attila Turoczy  
> > wrote:
> >
> > +1
> >
> >> On Fri, May 12, 2023 at 4:01 PM Alessandro Solimando <
> >> alessandro.solima...@gmail.com> wrote:
> >> Hi Stamatis,
> >> I am experiencing the same too, so +1 from me.
> >> Best regards,
> >> Alessandro
> >> On Fri, 12 May 2023 at 15:58, Stamatis Zampetakis 
> >> wrote:
> >>> Hello,
> >>> Everything that happens in a GitHub PR creates a worklog entry under
> >>> the respective JIRA ticket.
> >>> For every worklog entry we receive a notification from j...@apache.org
> >>> when we are watching an issue. The worklog entry and email
> >>> notification usually appear messy.
> >>> Moreover, if we are watching the GitHub PR we are going to get a
> >>> notification from notificati...@github.com which has the same content
> >>> with the JIRA worklog entry and is much more readable.
> >>> Finally, the PR notification is also going to
> >>> iss...@hive.apache.org and git...@hive.apache.org so those who are
> >>> subscribed to these lists
> >>> will get the same notification multiple times.
> >>> Personally, I never read the JIRA worklog notifications and I largely
> >>> prefer those from notificati...@github.com.
> >>> How do you feel about disabling the worklog entries in JIRA coming
> >>> from GitHub PRs?
> >>> For archiving purposes, the notifications already go to gitbox@ so we
> >>> don't lose anything from disabling the worklog entries. On the
> >>> contrary, I find that this would reduce the noise and redundancy in
> >>> our inboxes.
> >>> Concretely this is what I have in mind in terms of change:
> >>> https://github.com/apache/hive/pull/4318
> >>> Best,
> >>> Stamatis


[DISCUSS] Disable JIRA worklog for GitHub PRs

2023-05-12 Thread Stamatis Zampetakis
Hello,

Everything that happens in a GitHub PR creates a worklog entry under
the respective JIRA ticket.
For every worklog entry we receive a notification from j...@apache.org
when we are watching an issue. The worklog entry and email
notification usually appear messy.

Moreover, if we are watching the GitHub PR we are going to get a
notification from notificati...@github.com which has the same content
with the JIRA worklog entry and is much more readable.

Finally, the PR notification is also going to
iss...@hive.apache.org and git...@hive.apache.org so those who are
subscribed to these lists
will get the same notification multiple times.

Personally, I never read the JIRA worklog notifications and I largely
prefer those from notificati...@github.com.

How do you feel about disabling the worklog entries in JIRA coming
from GitHub PRs?

For archiving purposes, the notifications already go to gitbox@ so we
don't lose anything from disabling the worklog entries. On the
contrary, I find that this would reduce the noise and redundancy in
our inboxes.

Concretely this is what I have in mind in terms of change:
https://github.com/apache/hive/pull/4318

Best,
Stamatis


Re: Kill the Pig 

2023-04-28 Thread Stamatis Zampetakis
I checked the Pig repo and I see some recent activity. Rohini is actively
leading the effort towards a new Pig release. Given that there is proven
interest to maintain and contribute to this module I would prefer to keep
it for the time being unless there are major issues that I am not aware of.

Best,
Stamatis


On Thu, Apr 20, 2023, 9:09 PM Rohini Palaniswamy  wrote:

> Hi Attila,
>We still use HCatLoader and HCatStorer heavily and would like to retain
> the support. We are also fixing it to work with Iceberg tables and will be
> contributing patches for both Hive 3 and Hive 4. So would like the support
> to be continued with Hive 4.
>
> Regards,
> Rohini
>
> On Thu, Apr 20, 2023 at 1:59 AM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > +1 from me, let's just make sure we make a good salame out of it :)
> >
> > Best regards,
> > Alessandro
> >
> > On Thu, 20 Apr 2023 at 10:50, Attila Turoczy 
> > wrote:
> >
> >> Hi All,
> >>
> >> In Hive we have a pretty old component from 1972 and this is the Pig.
> Pig
> >> was cool somewhere in 2008, but nowadays it does not have any value in
> the
> >> big data world. Even the last small release of big was 6 years ago in
> 2017,
> >> also the pig community has pretty much died. Because this component is
> >> obsolete I would suggest removing it from Hive 4.0. The hive 3 will
> still
> >> contain it, but I think this is a right time to remove those components
> >> that are not valuable for the community.
> >>
> >> What do you think about it?
> >>
> >> Ps: If nobody wrote it back, It would mean I could kill the pig (rof
> rof)
> >> :)
> >>
> >> -Attila
> >>
> >
>


Re: Introducing a DI framework in Hive?

2023-04-19 Thread Stamatis Zampetakis
I think we all agree that DI can be beneficial in general.

However, it's hard to say yes or no on something before having a
concrete case to discuss; it doesn't have to be a PR but we need to
work on a specific Hive use-case and list advantages/disadvantages of
the proposal.

Best,
Stamatis

On Mon, Apr 17, 2023 at 7:33 PM Laszlo Vegh  wrote:
>
> Hi all,
>
> Sorry for not answering for so far, for some reason I did not receive your 
> answers in my gmail account. I’m happy to see that there’s a conversation 
> around the topic, so let me add my opinion on your points.
>
> First of all, introducing a DI framework does not mean a large scale 
> refactoring. A suitable module, or a well-bounded set of components can be 
> chosen as the first candidate. It’s also important that nobody will be forced 
> to utilise the DI container when writing features, or to redesign existing 
> code when it is being touched.
> As for the aim: I’ve worked quite a lot with Java and .Net DI frameworks, and 
> my experience was that having a DI framework greatly reduces the effort to 
> write well organised and maintainable code. While well organised code can be 
> written without DI frameworks too, the lack of such framework makes it much 
> more easier to write poorly designed code (bad scoping, lifecycle issues, 
> visibility issues, etc). On well-organised I mean:
> Design patterns: DI containers make it easier to write code using the well 
> known design patterns. For example you can implement factory, wrapper, 
> adapter, etc patterns by simply using the offered features as it is supposed 
> to do.
> Streamlined component initialisation: No more spaghetti/boilerplate component 
> init methods
> Well defined component scopes (lifecycle): DI frameworks support various 
> component scopes, which offers a fine grained control over component 
> lifecylce -> Singleton, one component per thread, one component per request 
> from DI container, etc.
> Organised and visible component/class dependencies: Through constructor 
> injection all the dependencies of a class are visible (unlike static method 
> calls). Using this approach it is impossible to create circular dependencies 
> which lead to object initialisation issues and hacks. By requiring all deps 
> during object creation it’s way easier to detect or avoid unwanted 
> dependencies. It also makes easier to better organise the code into packages 
> and modules
> Enhanced testability: I have explained this earlier.
> Well defined component visibility: No need for “union-all” context objects. 
> Instead of having context objects with references for all of the components 
> which may required during the execution, each execution step can obtain the 
> necessary dependencies from the DI container. Also, no more public static 
> methods, or class instances. In order to let some component accessible from 
> everywhere, there’s no need to make it public and static. DI frameworks also 
> offer nested/sub contexts to limit/control visibility.
> My original mail was supposed to be a kickoff, to start talking about DI. 
> Before creating a PR with an example in Hive, I would like to have a common 
> agreement that we want to do this, and there is no blocker which prevents us 
> from doing it. Once we have this agreement I can create a working example and 
> demonstrate how it will help us in the future.
> Regarding the stability and performance issues: Of course those must be 
> addressed as well, but as Stamatis pointed out, Hive is an open source 
> project and everybody can have its own initiative in parallel to the others’.
>
> In Java I have the most experience with Spring, so I would prefer choosing 
> it. It became huge by now, but it’s modular. We are not forced to use all of 
> the offered features, if we want a pure DI container with some basic 
> extensions, we would only need spring-core, spring-beans, and spring-context. 
> It has several extensions and supports tons of other well known frameworks 
> and/or technologies.
>
> Best regards,
> Laszlo Vegh


Re: Introducing a DI framework in Hive?

2023-04-13 Thread Stamatis Zampetakis
Just to be clear, I am in favor of introducing DI frameworks in Hive
where it makes sense. As Attila said, we don't want to get stuck with
legacy code forever. When a concrete proposal comes up we can discuss
benefits vs drawbacks.

Regarding stability I agree it is a pressing issue but Hive is an open
source project and we certainly don't want to force volunteers to work
on specific things or forbid them to work on others. Contributing to
open source is supposed to be a fun and rewarding experience. I am
sure many of the people in this list have stability as a primary goal
so eventually we will get there.

Best,
Stamatis


Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-04-12 Thread Stamatis Zampetakis
INFRA-24440 is resolved so all JIRA traffic now goes to issues@hive.
Don't forget to subscribe to that list if you wish to follow the
creation of new tickets etc.

Best,
Stamatis

On Fri, Apr 7, 2023 at 9:55 AM Stamatis Zampetakis  wrote:
>
> Just logged https://issues.apache.org/jira/browse/INFRA-24440 to move
> this forward.
>
> Best,
> Stamatis
>
> On Thu, Mar 30, 2023 at 11:12 AM Stamatis Zampetakis  
> wrote:
> >
> > I will proceed with the changes needed to move the Jira traffic out of the 
> > dev list sometime next week.
> >
> > If there are reasons to delay or abandon the proposal please let me know.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Mar 27, 2023, 5:39 AM Sungwoo Park  wrote:
> >>
> >> I like the proposal very much. (Then, hopefully this mailing list will
> >> be useful to outside contributors as well.)
> >>
> >> --- Sungwoo Park
> >>
> >> On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > In the last Hive board report someone mentioned that the volume of Jira
> >> > notification emails to the dev list is huge especially when compared to
> >> > emails send by actual humans making it hard for someone to follow what's
> >> > happening in the project.
> >> >
> >> > I personally share their viewpoint. For a long time I have been relying 
> >> > on
> >> > client side (Gmail) filters to separate Jira notifications from other
> >> > emails to the dev list.
> >> >
> >> > I think it would be better to direct the traffic from jira to a separate
> >> > list namely jira@hive to keep the dev@hive list clean and dedicated to
> >> > human interaction.
> >> >
> >> > What do you think?
> >> >
> >> > Best,
> >> > Stamatis
> >> >


Re: Introducing a DI framework in Hive?

2023-04-12 Thread Stamatis Zampetakis
Hey Laszlo,

Dependency injection is a very powerful and useful tool/design pattern.

I don't think there is a particular reason for which Hive does not use
DI framework apart maybe from the fact that we have lots of legacy
code that existed before DI became that popular.

I am open to ideas and suggestions about parts of the code that we
could improve via DI. I would probably avoid big refactorings to core
components of Hive for the sake of introducing a DI framework but I
see no big issue using such frameworks in new code. As usual when we
are about to introduce a new dependency to the project we should be
mindful of all the implications that this might have.

It's hard to make a generally applicable claim that we should use this
or that framework since I guess it has to do a lot with personal
preferences; we tend to prefer things that we have already used. I
haven't used DI frameworks that much so don't have a strong opinion on
which framework is the best so I am willing to follow the majority.

Best,
Stamatis

On Tue, Apr 4, 2023 at 1:19 PM Laszlo Vegh  wrote:
>
>
> Hi all,
>
> I would like to start a conversation about introducing some Dependency 
> Injection framework (like Spring, Guice, Weld, etc.) in Hive.
>
> IMHO the lack of such framework makes the codebase way less organised, and 
> harder to maintain. Moreover, I think it also lead to introducing a huge 
> amount of static/utility methods and classes (which is highly discouraged 
> when using DI frameworks). When there is no DI framework, utility classes 
> with static methods often seem to be the simplest and best way to share code 
> across different Hive components/classes, but these constructs are really 
> killing testability. For example it is much harder to mock static method 
> calls, than mocking service/component instances. Poor testability is a major 
> issue on its own, but having a DI framework could have much more benefit, 
> like greater flexibility (modularity), better organised services, etc.
>
>
> I’m interested if there’s any reason why there is no DI in Hive so far. I 
> know there’s no way to introduce it everywhere in a single step, but we could 
> start using it where it is easy to start, and continuously expand its usage 
> from class to class. If there is no strong reason why no to do it, I would 
> like to start an open conversation around this topic. (Possible benefits, 
> drawbacks, which framework to use, where to introduce it first, etc.)
>
> If anybody is interested in this initiative, please join the conversation, 
> and add your thoughts, ideas, doubts, anything.
>
> Thanks,
>
> Laszlo Vegh
> veghlac...@gmail.com 


Re: Lateral views and CBO

2023-04-09 Thread Stamatis Zampetakis
Hi Steve,

The way that we currently represent lateral views on the physical plan is
not great. I'm sure there are good reasons of why people went forward with
the approach of introducing specialized operators for that purpose but as
the history has shown the current representation is causing us lots of
problems in many parts of the compiler where we need to traverse the plan
DAG (rules + hooks + explain) leading to exponential visit of the plan
nodes.

If we decorrelate the plan as usual maybe we could get rid of the problems
mentioned above and the need for specialized lateral view physical
operators which I think would be a good thing in the long run.


Best,
Stamatis

On Thu, Apr 6, 2023, 6:31 PM Stephen Carlin 
wrote:

> Hi,
>
> I noticed recently that for most lateral views, we do not convert to a CBO
> plan.  I was hoping to make some changes to make this possible.
>
> I was wondering if anyone out there had any thoughts on how to do this.
>
> If not, I did have one in mind and wanted to bring it up just to start the
> conversation before I did any work on it.
>
> I was thinking maybe we could have our own RelNode deriving off of
> LogicalCorrelate.  I know we already handle correlated queries and
> decorrelate them.  But in this special case, I think we should not
> decorrelate the RelNode.  We should leave this new LogicalCorrelate node in
> through the whole optimization process and then put in code to translate
> the RelNode into the physical plan.
>
> Thoughts?
>
> Thanks!
>


Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-04-07 Thread Stamatis Zampetakis
Just logged https://issues.apache.org/jira/browse/INFRA-24440 to move
this forward.

Best,
Stamatis

On Thu, Mar 30, 2023 at 11:12 AM Stamatis Zampetakis  wrote:
>
> I will proceed with the changes needed to move the Jira traffic out of the 
> dev list sometime next week.
>
> If there are reasons to delay or abandon the proposal please let me know.
>
> Best,
> Stamatis
>
> On Mon, Mar 27, 2023, 5:39 AM Sungwoo Park  wrote:
>>
>> I like the proposal very much. (Then, hopefully this mailing list will
>> be useful to outside contributors as well.)
>>
>> --- Sungwoo Park
>>
>> On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
>>
>> > Hi everyone,
>> >
>> > In the last Hive board report someone mentioned that the volume of Jira
>> > notification emails to the dev list is huge especially when compared to
>> > emails send by actual humans making it hard for someone to follow what's
>> > happening in the project.
>> >
>> > I personally share their viewpoint. For a long time I have been relying on
>> > client side (Gmail) filters to separate Jira notifications from other
>> > emails to the dev list.
>> >
>> > I think it would be better to direct the traffic from jira to a separate
>> > list namely jira@hive to keep the dev@hive list clean and dedicated to
>> > human interaction.
>> >
>> > What do you think?
>> >
>> > Best,
>> > Stamatis
>> >


[jira] [Created] (HIVE-27225) Speedup build by skipping SBOM generation by default

2023-04-06 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27225:
--

 Summary: Speedup build by skipping SBOM generation by default
 Key: HIVE-27225
 URL: https://issues.apache.org/jira/browse/HIVE-27225
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


A full build of Hive locally in my environment takes ~15 minutes.
{noformat}
mvn clean install -DskipTests -Pitests
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  14:15 min
{noformat}

Profiling the build shows that we are spending roughly 30% of CPU in 
org.cyclonedx.maven plugin which is used to generate SBOM artifacts 
(HIVE-26912). 

The SBOM generation does not need run in every single build and probably needs 
to be active only during the release build. To speed-up every-day builds I 
propose to activate the cyclonedx plugin only in the dist (release) profile.

After this change, the default build drops from 14 minutes to 8.
{noformat}
mvn clean install -DskipTests -Pitests
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time:  08:19 min
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27199) Read TIMESTAMP WITH LOCAL TIME ZONE columns from text files using custom formats

2023-03-30 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27199:
--

 Summary: Read TIMESTAMP WITH LOCAL TIME ZONE columns from text 
files using custom formats
 Key: HIVE-27199
 URL: https://issues.apache.org/jira/browse/HIVE-27199
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Timestamp values come in many flavors and formats and there is no single 
representation that can satisfy everyone especially when such values are stored 
in plain text/csv files.

HIVE-9298, added a special SERDE property, {{{}timestamp.formats{}}}, that 
allows to provide custom timestamp patterns to parse correctly TIMESTAMP values 
coming from files.

However, when the column type is TIMESTAMP WITH LOCAL TIME ZONE (LTZ) it is not 
possible to use a custom pattern thus when the built-in Hive parser does not 
match the expected format a NULL value is returned.

Consider a text file, F1, with the following values:
{noformat}
2016-05-03 12:26:34
2016-05-03T12:26:34
{noformat}
and a table with a column declared as LTZ.
{code:sql}
CREATE TABLE ts_table (ts TIMESTAMP WITH LOCAL TIME ZONE);
LOAD DATA LOCAL INPATH './F1' INTO TABLE ts_table;

SELECT * FROM ts_table;
2016-05-03 12:26:34.0 US/Pacific
NULL
{code}
In order to give more flexibility to the users relying on the TIMESTAMP WITH 
LOCAL TIME ZONE datatype and also align the behavior with the TIMESTAMP type 
this JIRA aims to reuse the {{timestamp.formats}} property for both TIMESTAMP 
types.

The work here focuses exclusively on simple text files but the same could be 
done for other SERDE such as JSON etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [EXTERNAL] Re: Branch-3 backports and build stability

2023-03-30 Thread Stamatis Zampetakis
Huge thanks to everyone involved it is great to see the branch-3 in stable
state. As other people mentioned let's keep it that way!

As far as it concerns back ports please be particularly cautious with
anything that touches the metastore schema and Thrift APIs.

Best,
Stamatis

On Wed, Mar 29, 2023, 4:36 AM vihang karajgaonkar 
wrote:

> Thanks a lot Aman for all your efforts on this. Really appreciate the
> initiative and all your hard work on this.
>
> I would like to request that all the committers should follow the merge
> process of master branch to merge PRs in branch-3. If there are any test
> failures which seem unrelated, please do not ignore them. One can run the
> flaky
> test runner <http://ci.hive.apache.org/job/hive-flaky-check/> to make sure
> that test is indeed flaky. If the test is found to be flaky a
> ticket should be created to disable it. A separate ticket should be created
> to deflake it and you can mention the original author or previous commit
> author who changed the test on that ticket to get help since they likely
> have the most context around that test. Once the flaky test is disabled and
> we have a green CI job run, we should merge the PR. If others have any
> suggestions to improve this process please chime in.
>
> Thanks,
> Vihang
>
> On Tue, Mar 28, 2023 at 10:55 PM Aman Raj 
> wrote:
>
> > Hi community,
> >
> > This is to notify that we have a green branch-3 now. The entire effort of
> > fixing branch-3 test cases took around 4 months and as a team we managed
> to
> > fix 2900+ test failures on branch-3. The entire effort can be tracked
> here
> > HIVE-26836<https://issues.apache.org/jira/browse/HIVE-26836>. We are
> > ready to push new features and improvements on branch-3 now.
> >
> > I really want to thank Vihang Karajgaonkar, Chris Nauroth, Lazlo Bodor,
> > Stamatis Zampetakis and Sankar Hariappan without whom this would not at
> all
> > have been possible. As a team we stuck together and participated in
> reviews
> > and actively suggested improvements which really helped in fixing some
> > major test failures.
> >
> > I would sincerely request that going further it should be made a point to
> > merge things into branch-3 only if we have a green Jenkins pipeline.
> >
> > The next step would be to backport changes from branch-3.1 (From where
> > Hive-3.1.3 release was made) to branch-3. This would ensure that we do
> not
> > miss any specific ticket which went into Hive-3.1.3. I will take care of
> > this. We can parallelly start pushing additional changes on branch-3.
> There
> > are approximately 25 tickets that need to be backported in this effort
> (Of
> > backporting changes from branch-3.1). I have made a note here<
> >
> https://docs.google.com/spreadsheets/d/1K0U-vxLRZEs13oBzYBlVyK8dMMNthgXL5VEgzLRbeKs/edit?usp=sharing
> > >
> >
> > Again, thanks a lot to everyone who supported and participated in this
> > effort. Lets make this 3.2.0 Hive release happen!!
> >
> > Thanks,
> > Aman.
> >
> > 
> > From: Aman Raj 
> > Sent: Monday, March 20, 2023 9:21 AM
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > Hi Vihang/community,
> >
> > Found the ticket which broke mm_all.q. This issue comes because of
> > HIVE-20182. Works in my local and on the Jenkins pipeline as well. Link :
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Fpull%2F4127=05%7C01%7Crajaman%40microsoft.com%7C043f385c28ce4867174208db28f66afd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148811080483635%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=XSPlEtfWDNV%2Fccv9Q33xUtMLuhvxHx3CD4kC%2F5mWj2Y%3D=0
> > <https://github.com/apache/hive/pull/4127> Reverting this commit for
> now.
> >
> > Thanks,
> > Aman.
> > 
> > From: Aman Raj 
> > Sent: Monday, March 20, 2023 8:28 AM
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > Sure Vihang, will look at the other ones. You can pick this up.
> >
> > Thanks,
> > Aman.
> >
> > Get Outlook for Android<
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg=05%7C01%7Crajaman%40microsoft.com%7C043f385c28ce4867174208db28f66afd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148811080483635%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1ha

Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-03-30 Thread Stamatis Zampetakis
I will proceed with the changes needed to move the Jira traffic out of the
dev list sometime next week.

If there are reasons to delay or abandon the proposal please let me know.

Best,
Stamatis

On Mon, Mar 27, 2023, 5:39 AM Sungwoo Park  wrote:

> I like the proposal very much. (Then, hopefully this mailing list will
> be useful to outside contributors as well.)
>
> --- Sungwoo Park
>
> On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
>
> > Hi everyone,
> >
> > In the last Hive board report someone mentioned that the volume of Jira
> > notification emails to the dev list is huge especially when compared to
> > emails send by actual humans making it hard for someone to follow what's
> > happening in the project.
> >
> > I personally share their viewpoint. For a long time I have been relying
> on
> > client side (Gmail) filters to separate Jira notifications from other
> > emails to the dev list.
> >
> > I think it would be better to direct the traffic from jira to a separate
> > list namely jira@hive to keep the dev@hive list clean and dedicated to
> > human interaction.
> >
> > What do you think?
> >
> > Best,
> > Stamatis
> >
>


[DISCUSS] Move Jira notification emails out of dev@hive

2023-03-25 Thread Stamatis Zampetakis
Hi everyone,

In the last Hive board report someone mentioned that the volume of Jira
notification emails to the dev list is huge especially when compared to
emails send by actual humans making it hard for someone to follow what's
happening in the project.

I personally share their viewpoint. For a long time I have been relying on
client side (Gmail) filters to separate Jira notifications from other
emails to the dev list.

I think it would be better to direct the traffic from jira to a separate
list namely jira@hive to keep the dev@hive list clean and dedicated to
human interaction.

What do you think?

Best,
Stamatis


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-25 Thread Stamatis Zampetakis
Regarding correctness, I think it makes sense to change default values and
possibly add a warning note when there's a known risk of wrong results.
Needless to say that we should try to fix as many issues as possible; we
still need volunteers to review open PRS.

Performances regressions are trickier but if we have the query plans (CBO +
full) along with logs (including task counters) for fast and slow execution
we may be able to understand what happens. Don't hesitate to create Jira
tickets with these information if available.

Last regarding 4.0.0 blockers, I don't think we need a special label. The
built-in and widely used priority "blocker" seems enough to capture the
importance and urgency of a ticket.
Since I am the release manager for the next release I will go over tickets
marked as blockers and reevaluate priorities if necessary.

Best,
Stamatis

On Thu, Mar 23, 2023, 10:27 AM Denys Kuzmenko  wrote:

> Thanks, Sungwoo for running the TPC-DS benchmark. Do we know if the same
> level of performance degradation was present in 4.0.0-alpha1?
>
> All: please use the `hive-4.0.0-must` label in a ticket if you think it's
> a show-stopper for the release.
>


[jira] [Created] (HIVE-27162) Unify HiveUnixTimestampSqlOperator and HiveToUnixTimestampSqlOperator

2023-03-21 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27162:
--

 Summary: Unify HiveUnixTimestampSqlOperator and 
HiveToUnixTimestampSqlOperator
 Key: HIVE-27162
 URL: https://issues.apache.org/jira/browse/HIVE-27162
 Project: Hive
  Issue Type: Task
  Components: CBO
Reporter: Stamatis Zampetakis


The two classes below both represent the {{unix_timestamp}} operator and have 
identical implementations.
* 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java
* 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java

Probably there is a way to use one or the other and not both; having two ways 
of representing the same thing can bring various problems in query planning and 
it also leads to code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27161) MetaException when executing CTAS query in Druid storage handler

2023-03-21 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27161:
--

 Summary: MetaException when executing CTAS query in Druid storage 
handler
 Key: HIVE-27161
 URL: https://issues.apache.org/jira/browse/HIVE-27161
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis


Any kind of CTAS query targeting the Druid storage handler fails with the 
following exception:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:LOCATION may not be specified for Druid)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1347) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1352) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:158)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:116)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:354) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:327) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:244) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:105) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:367) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:205) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:154) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:149) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:257) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd1(CliDriver.java:201) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:127) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:425) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:356) 
~[hive-cli-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.initDataset(QTestDatasetHandler.java:86)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.dataset.QTestDatasetHandler.beforeTest(QTestDatasetHandler.java:190)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.QTestUtil.cliInit(QTestUtil.java:607) 
~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:112)
 ~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:157) 
~[hive-it-util-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.cli.TestMiniDruidCliDriver.testCliDriver(TestMiniDruidCliDriver.java:60)
 ~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_261]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_261]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_261]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 ~[junit-4.13.2.jar:4.13.2

Release managers

2023-03-21 Thread Stamatis Zampetakis
Hi all,

As discussed in another thread [1], it might be a good idea to agree
on the next 4 release managers (RM) beforehand to maintain as much as
possible a stable release cadence.

I can volunteer to be the RM for the next Hive release. Other any
other volunteers for the rest?

4.0.0 Stamatis Zampetakis
4.1.0
4.2.0
4.3.0

The versions are just placeholders for now so we don't need to agree
on the name at this stage.
The important is to have a release from master out every 3-4 months.

Note that you don't need to be a Hive PMC member to prepare a release
candidate. Committers should be able to complete most of the steps
involved in the process [2].

Best,
Stamatis

[1] https://lists.apache.org/thread/bg4g1w75ks11jh273bh3pct81x9brv0c
[2] 
https://cwiki.apache.org/confluence/display/Hive/HowToRelease#HowToRelease-HiveRelease


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-21 Thread Stamatis Zampetakis
Many thanks for running tests with 4.0.0 Sungwoo; it is invaluable
help for getting out a stable Hive 4.

I will review https://issues.apache.org/jira/browse/HIVE-26968 in the
coming weeks; I have assigned myself as reviewer in the PR.

Can some other people (committers or not) help in reviewing the
remaining TPC-DS blockers for which we have a PR?

Reminder: Good non-binding reviews are important and much appreciated
by the community. They are also among the important metrics for
becoming a Hive committer/PMC [1].

Best,
Stamatis

[1] https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter

On Tue, Mar 14, 2023 at 12:07 PM Sungwoo Park  wrote:
>
> Hello,
>
> I would like to expand the list of blockers with HIVE-27138 [1] which fixes 
> NPE
> on mapjoin_filter_on_outerjoin.q.
>
> Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution
> engine and shows no problem. However, it shows a few problems when tested with
> Tez execution engine. HIVE-27138 is the first fix found after analyzing
> mapjoin_filter_on_outerjoin.q, and Seonggon will create a couple more tickets
> later.
>
> In the meanwhile, it would be great if someone could review pull requests for
> subtasks in HIVE-26654. (I moved to HIVE-26654 three tickets that I previously
> requested code review for.)
>
> Best,
>
> --- Sungwoo
>   [1] https://issues.apache.org/jira/browse/HIVE-27138
>
> On Fri, 10 Mar 2023, Stamatis Zampetakis wrote:
>
> > Hi Kirti,
> >
> > Thanks for bringing up this topic.
> >
> > The master branch already has many new features; we don't need to wait for
> > more to cut a GA.
> >
> > The main criterion for going GA is stability thus I would consider
> > regressions as the only blockers for the release.
> >
> > If I recall well the only regressions discovered so far are some problems
> > with TPC-DS queries so basically HIVE-26654 [1].
> >
> > I will let others chime in to include more tickets if necessary.
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26654
> >
> >
> > On Wed, Mar 8, 2023 at 10:02?AM Kirti Ruge  wrote:
> >
> >> Hello Hive Dev,
> >>
> >> It has been about 6 months since Hive-4.0-alpha-2 was released in Nov 2022.
> >> Would it be a good time to discuss about HIVE-4.0 GA  release to the
> >> community ? Can we have discussion on the new features/jdk support versions
> >> which we want to publish as part of 4.0 GA , timeframe of release.
> >>
> >>
> >> Thanks,
> >> Kirti
> >


[jira] [Created] (HIVE-27157) AssertionError when inferring return type for unix_timestamp function

2023-03-20 Thread Stamatis Zampetakis (Jira)
Stamatis Zampetakis created HIVE-27157:
--

 Summary: AssertionError when inferring return type for 
unix_timestamp function
 Key: HIVE-27157
 URL: https://issues.apache.org/jira/browse/HIVE-27157
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis


Any attempt to derive the return data type for the {{unix_timestamp}} function 
results into the following assertion error.
{noformat}
java.lang.AssertionError: typeName.allowsPrecScale(true, false): BIGINT
at 
org.apache.calcite.sql.type.BasicSqlType.checkPrecScale(BasicSqlType.java:65)
at org.apache.calcite.sql.type.BasicSqlType.(BasicSqlType.java:81)
at 
org.apache.calcite.sql.type.SqlTypeFactoryImpl.createSqlType(SqlTypeFactoryImpl.java:67)
at 
org.apache.calcite.sql.fun.SqlAbstractTimeFunction.inferReturnType(SqlAbstractTimeFunction.java:78)
at 
org.apache.calcite.rex.RexBuilder.deriveReturnType(RexBuilder.java:278)
{noformat}
due to a faulty implementation of type inference for the respective operators:
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveUnixTimestampSqlOperator.java]
 * 
[https://github.com/apache/hive/blob/52360151dc43904217e812efde1069d6225e9570/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveToUnixTimestampSqlOperator.java]

Although at this stage in master it is not possible to reproduce the problem 
with an actual SQL query the buggy implementation must be fixed since slight 
changes in the code/CBO rules may lead to methods relying on 
{{{}SqlOperator.inferReturnType{}}}.

Note that in older versions of Hive it is possible to hit the AssertionError in 
various ways. For example in Hive 3.1.3 (and older), the error may come from 
[HiveRelDecorrelator|https://github.com/apache/hive/blob/4df4d75bf1e16fe0af75aad0b4179c34c07fc975/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelDecorrelator.java#L1933]
 in the presence of sub-queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   3   4   >