Re: Re: [ANNOUNCE] New Committer: Simhadri Govindappa

2024-04-18 Thread Alessandro Solimando
Great news, Simhadri, very well deserved!

On Thu, 18 Apr 2024 at 15:07, Simhadri G  wrote:

> Thanks everyone!
> I really appreciate it, it means a lot to me :)
> The Apache Hive project and its community have truly inspired me . I'm
> grateful for the chance to contribute to such a remarkable project.
>
> Thanks!
> Simhadri Govindappa
>
> On Thu, Apr 18, 2024 at 6:18 PM Sankar Hariappan
>  wrote:
>
>> Congrats Simhadri!
>>
>>
>>
>> -Sankar
>>
>>
>>
>> *From:* Butao Zhang 
>> *Sent:* Thursday, April 18, 2024 5:39 PM
>> *To:* u...@hive.apache.org; dev 
>> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> You don't often get email from butaozha...@163.com. Learn why this is
>> important 
>>
>> Congratulations Simhadri !!!
>>
>>
>>
>> Thanks.
>>
>>
>> --
>>
>> *发件人**:* user-return-28075-butaozhang1=163@hive.apache.org <
>> user-return-28075-butaozhang1=163@hive.apache.org> 代表 Ayush Saxena <
>> ayush...@gmail.com>
>> *发送时间**:* 星期四, 四月 18, 2024 7:50 下午
>> *收件人**:* dev ; u...@hive.apache.org <
>> u...@hive.apache.org>
>> *主题**:* [ANNOUNCE] New Committer: Simhadri Govindappa
>>
>>
>>
>> Hi All,
>>
>> Apache Hive's Project Management Committee (PMC) has invited Simhadri
>> Govindappa to become a committer, and we are pleased to announce that he
>> has accepted.
>>
>>
>>
>> Please join me in congratulating him, Congratulations Simhadri, Welcome
>> aboard!!!
>>
>>
>>
>> -Ayush Saxena
>>
>> (On behalf of Apache Hive PMC)
>>
>


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Alessandro Solimando
Hey guys,
I advise you to take Stamatis' suggestion and cool it off for a bit, this
is getting personal and you know it's only counterproductive.

Having worked with both of you, I know you have always done everything in
the best interest of the project and always acted in good faith, no reason
to look for further reasons on past and current choices.

You might not agree on what's best now, but I am pretty sure you will
accept whatever the community thinks it's best, so an alternative is to
cast a vote and act based on the outcome, the Apache way! :)

Best regards,
Alessandro

On Tue, 23 Jan 2024 at 11:15, Ayush Saxena  wrote:

> Ok Zoltan, you are always right, but listening to other people sometimes
> doesn't hurt, maybe even if they aren't as smart as you (like me & everyone
> you consider not so smart like you).
>
> Let me ask something:
> * If something breaks like code or something like that? What do you do?
> Humiliate or throw sarcastic comments?
> -> In general no, We go and help fix it, or ask the guy to revert it &
> share what you feel is right? if there isn't an agreement, we discuss not
> like "Whatever you want to do", for you that might be fancy, but not for
> others
>
> -> You don't like an approach? The way I did it? or "I did it"
> It is all Ok, that apache repo actually gives rights to all the hive
> committers, whether I know them, you know them or nobody knows them. anyone
> can do what they like & consider good for Apache Hive, that is opensource.
> The amount of time we are spending on this thread, that much time would
> have sorted things out
>
> -> Your response mechanism?
> How tough is it to spread a sense of inclusive community? How tough is it
> to write in a humble way? I am not convinced with this approach? I know a
> better approach? That is X->Y->Z, We should do that? or I will go ahead and
> do that? Did anyone stop anyone from doing anything here? What is the point
> being proven, that someone is superior & can go freely & yell or scold
> people, or it is demonstrating "I deserve respect not the others"
>
> -> Is there a discussion about the topic?
> How long do you want the discussion? If you were particular about an
> approach, you could have shared it, I went wrong, you can still share it,
> you don't want me to do it, you can still do it yourself, and believe me
> even if you screw up things. None of us will come up & throw any tantrums
> or sarcastic comments. That is not most of us.
>
> I maybe wrong, anyone can be wrong, the only person who didn't do anything
> wrong or broke anything is someone who didn't do anything ambitious...
>
> I will still try to answer your questions Zoltan
>
> >  I was replacing the CI I've used it to get a good base ground for
> running the tests - as it could prepare a lot of things already.
> I had to do a lot of things - and the move of the repo was never at the top
> of the list...
>
> It is all "I" my friend, you thought, till you were only using it, it was
> good. But Hive started using it, It wasn't. How many Apache projects rely
> on even making a JDK change for their test to run on external folks, using
> your personal fork itself at the very first point was violation of apache
> way. Migration should have been done first, or other alternatives should
> have been explored rather than relying on some non apache fork
>
> > That's not true either as you can see in a week-or-so old comment from me
>
> And again, You want all "Apache" contributors to follow your repo & follow
> these comments?
>
> -> Reading the comment?
>
> We push it our own docker hub space & push a commit to my own docker hub
> space or rely on "you" to build or add "me" as a collaborator? The project
> & its component aren't own by any of the member, good news not me, bad news
> not by you either. The project & it is infra, CI resources are owned by the
> Project PMC (not any member) & the PMC is no way superior to any
> contributor. That is basic Apache stuff, I know you know them better than
> me.
>
> I still don't want to drag it further, Zoltan you are the most important
> person in this thread, the code is in your repo, whatever way you want, we
> will get that done the same way. Does that make you happy? But please get
> away with this attitude, none of us, not me, nor any contributor is at
> mercy of anyone else, everyone is equal, equal right, have all rights to
> make mistakes, discuss, get them corrected & still come back to this place
> without any humiliation with his head held high. Community over Code
>
> I think you will retrospect, you are a respected member of the community &
> will work in a way it is good & peaceful for all, rest I can't help it.
>
> I will let Stamatis & you or anyone you like go forward with this. Stamatis
> has all rights around the repo, if you want any deletion or so, rest INFRA
> ticket can get anything sorted, if it requires me to write something, I can
> accept it was my bad, fortunately I don't have ego issues :-)
>

Re: Cleanup remote feature/wip branches

2024-01-19 Thread Alessandro Solimando
+1, thanks Stamatis

On Fri, Jan 19, 2024, 11:14 Ayush Saxena  wrote:

> +1
>
> -Ayush
>
> > On 19-Jan-2024, at 3:41 PM, Stamatis Zampetakis 
> wrote:
> >
> > Hey everyone,
> >
> > I noticed that in our official git repo [1] we have some kind of
> > feature/WIP branches (see list below). Most of them (if not all) are
> > stale, add noise, and some of them eat CI resources (storage and CPU)
> > since Jenkins picks them up for builds/precommits.
> >
> > I would like to drop those at the end of this email. Please +1 if you
> agree.
> >
> > Best,
> > Stamatis
> >
> > [1] https://github.com/apache/hive/branches/all
> >
> > git branch -r | grep origin | grep -v "branch-" | grep -v "master"
> >  origin/HIVE-23274_280_rb
> >  origin/HIVE-23337_280_rb
> >  origin/HIVE-23403_280_rb
> >  origin/HIVE-23440_280_rb
> >  origin/HIVE-23470_rb
> >  origin/HIVE-4115
> >  origin/branc-2.3
> >  origin/cbo
> >  origin/dependabot/maven/com.google.protobuf-protobuf-java-3.21.7
> >
> origin/dependabot/maven/itests/qtest-druid/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> >  origin/dependabot/maven/org.apache.commons-commons-text-1.10.0
> >  origin/dependabot/maven/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> >  origin/dependabot/maven/org.postgresql-postgresql-42.4.3
> >
> origin/dependabot/maven/standalone-metastore/com.google.protobuf-protobuf-java-3.21.7
> >
> origin/dependabot/maven/standalone-metastore/org.eclipse.jetty-jetty-server-9.4.51.v20230217
> >
> origin/dependabot/maven/standalone-metastore/org.postgresql-postgresql-42.4.3
> >  origin/ptf-windowing
> >  origin/release-1.1
> >  origin/revert-1365-upgrade-guava
> >  origin/revert-1855-HIVE-24624
> >  origin/revert-2694-HIVE-25355
> >  origin/revert-3624-HIVE-26567
> >  origin/revert-4247-hive-23256
> >  origin/revert-4306-HIVE-27330
> >  origin/revert-4452-HIVE-57988-BetweenBugFix
> >  origin/revert-4501-OptimizeGetPartitionAPI
> >  origin/vectorization
>


Re: [EXTERNAL] Re: [VOTE] Mark Hive 1.x EOL

2024-01-17 Thread Alessandro Solimando
+1 (non binding)

On Wed, 17 Jan 2024 at 10:23, Denys Kuzmenko  wrote:

> +1 (binding)
>


Re: [DISCUSS] Disable JIRA worklog for GitHub PRs

2023-05-12 Thread Alessandro Solimando
Hi Stamatis,
I am experiencing the same too, so +1 from me.

Best regards,
Alessandro

On Fri, 12 May 2023 at 15:58, Stamatis Zampetakis  wrote:

> Hello,
>
> Everything that happens in a GitHub PR creates a worklog entry under
> the respective JIRA ticket.
> For every worklog entry we receive a notification from j...@apache.org
> when we are watching an issue. The worklog entry and email
> notification usually appear messy.
>
> Moreover, if we are watching the GitHub PR we are going to get a
> notification from notificati...@github.com which has the same content
> with the JIRA worklog entry and is much more readable.
>
> Finally, the PR notification is also going to
> iss...@hive.apache.org and git...@hive.apache.org so those who are
> subscribed to these lists
> will get the same notification multiple times.
>
> Personally, I never read the JIRA worklog notifications and I largely
> prefer those from notificati...@github.com.
>
> How do you feel about disabling the worklog entries in JIRA coming
> from GitHub PRs?
>
> For archiving purposes, the notifications already go to gitbox@ so we
> don't lose anything from disabling the worklog entries. On the
> contrary, I find that this would reduce the noise and redundancy in
> our inboxes.
>
> Concretely this is what I have in mind in terms of change:
> https://github.com/apache/hive/pull/4318
>
> Best,
> Stamatis
>


Re: Kill the Pig 

2023-04-20 Thread Alessandro Solimando
+1 from me, let's just make sure we make a good salame out of it :)

Best regards,
Alessandro

On Thu, 20 Apr 2023 at 10:50, Attila Turoczy  wrote:

> Hi All,
>
> In Hive we have a pretty old component from 1972 and this is the Pig. Pig
> was cool somewhere in 2008, but nowadays it does not have any value in the
> big data world. Even the last small release of big was 6 years ago in 2017,
> also the pig community has pretty much died. Because this component is
> obsolete I would suggest removing it from Hive 4.0. The hive 3 will still
> contain it, but I think this is a right time to remove those components
> that are not valuable for the community.
>
> What do you think about it?
>
> Ps: If nobody wrote it back, It would mean I could kill the pig (rof rof)
> :)
>
> -Attila
>


Re: [EXTERNAL] Re: Proposal to deprecate Hive on Spark from branch-3

2023-02-23 Thread Alessandro Solimando
+1 from me too

On Thu, 23 Feb 2023 at 06:09, Ayush Saxena  wrote:

> +1 on removing Hive on Spark from branch-3
>
> -Ayush
>
> > On 23-Feb-2023, at 6:40 AM, Wang, Yuming 
> wrote:
> >
> > +1.
> >
> > From: Naresh P R 
> > Date: Thursday, February 23, 2023 at 02:49
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Proposal to deprecate Hive on Spark from
> branch-3
> > External Email
> >
> > +1 to remove Hive on Spark in branch-3
> > ---
> > Regards,
> > Naresh P R
> >
> >> On Wed, Feb 22, 2023 at 5:37 AM Sankar Hariappan
> >>  wrote:
> >>
> >> +1, to remove Hive on Spark in branch-3.
> >>
> >> Thanks,
> >> Sankar
> >>
> >> -Original Message-
> >> From: Rajesh Balamohan 
> >> Sent: Wednesday, February 22, 2023 6:58 PM
> >> To: dev@hive.apache.org
> >> Subject: [EXTERNAL] Re: Proposal to deprecate Hive on Spark from
> branch-3
> >>
> >> +1 on removing Hive on Spark in branch-3.
> >>
> >> It was not done earlier since it was removing a feature in the branch.
> But
> >> if there is enough consensus, we should consider removing it.
> >>
> >> ~Rajesh.B
> >>
> >> On Wed, Feb 22, 2023 at 12:48 PM Aman Raj  >
> >> wrote:
> >>
> >>> Hi team,
> >>>
> >>> We have been trying to fix Hive on Spark test failures for a long
> >>> time. As of now, branch-3 has less than 12 test failures (whose fix
> >>> have not been identified). 8 of them are related to Hive on Spark. I
> >>> had mailed about the failures in my previous mail threads. Thanks to
> >>> Vihang for working on them as well. But we have not been able to
> >> identify the root cause till now.
> >>> These fixes can be tracked in the following tickets : [HIVE-27087] Fix
> >>> TestMiniSparkOnYarnCliDriver test failures on branch-3 - ASF JIRA (
> >>> apache.org)<
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fnam06.safelinks.protection.outlook.com%2F%3Furl%3Dhttps%2525=05%7C01%7Cyumwang%40ebay.com%7C2bd54cc0c84a4e44a59e08db150574e5%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638126885411646147%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=KORvfBkmdpqIFOoWr6J4X%2BqAQO6jcykzjY3%2FU0pq0y4%3D=0
> >>> 3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-27087=05%7C01%7C
> >>> Sankar.Hariappan%40microsoft.com%7C687a6a4dbd41454568e008db14d8cc23%7C
> >>> 72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638126693641861742%7CUnknow
> >>> n%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLC
> >>> JXVCI6Mn0%3D%7C3000%7C%7C%7C=RbAqrwK6fQFDStufXYfpusNc81EzjtpiaHm
> >>> qv5CFiAs%3D=0> and [HIVE-26940] Backport of HIVE-19882 : Fix
> >>> QTestUtil session lifecycle - ASF JIRA
> >>> (apache.org)<
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhttps%2F=05%7C01%7Cyumwang%40ebay.com%7C2bd54cc0c84a4e44a59e08db150574e5%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638126885411646147%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=nmosJ%2FQ2JM1UBDuightSWLL9haNQFuc24Zkvo4RnUh4%3D=0
> >>> %3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26940=05%7C01%7
> >>> CSankar.Hariappan%40microsoft.com%7C687a6a4dbd41454568e008db14d8cc23%7
> >>> C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638126693641861742%7CUnkno
> >>> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> >>> CJXVCI6Mn0%3D%7C3000%7C%7C%7C=PaDtEZD569Sv0ER9sC4l6q1ZxyoBeER3zn
> >>> Bsc51PWI8%3D=0>
> >>>
> >>> Until we have a green branch-3, we cannot go ahead to push new
> >>> features for the Hive-3.2.0 release. This is kind of a blocker for this
> >> release.
> >>> Already bringing the test fixes to the current state took more than 2
> >>> months.
> >>>
> >>> I wanted to bring up a proposal to deprecate Hive on Spark from
> >>> branch-3 altogether. This would ensure that branch-3 is aligned with
> >>> the master as done in
> >>>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissu%2F=05%7C01%7Cyumwang%40ebay.com%7C2bd54cc0c84a4e44a59e08db150574e5%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C638126885411646147%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=n51cF5fFuwSFFhX%2B0S828W3jYN3G3YwRwJWne1AMGtg%3D=0
> >>> es.apache.org
> >> %2Fjira%2Fbrowse%2FHIVE-26134=05%7C01%7CSankar.Hariappan%
> >> 40microsoft.com
> %7C687a6a4dbd41454568e008db14d8cc23%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638126693641861742%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=j%2F%2BYTakSvfk1Mm%2Fq8kI84gKm6s%2Fg2iA1abIPzY7t%2Bsg%3D=0.
> >> Just wanted to have a vote on this in parallel working on the test
> fixes.
> >> If we have the approval from the community, we can deprecate it
> altogether.
> >>>
> >>> Please feel free to suggest any concerns or suggestions you have.
> >>> Also, I welcome any possible fix suggestion for the test failures.
> >>>
> >>> Thanks,
> >>> Aman.
> >>>
> >>
>


Re: Asking for code review: HIVE-26968, HIVE-26986, HIVE-27006

2023-02-14 Thread Alessandro Solimando
Hi Sungwoo,
thanks for bringing this up, IMO correctness issues should be set to
"Blocker" level in Jira, therefore no 4.0.0 should be released before
fixing the aforementioned tickets.

The patches seem well though and solid from a cursory look, but they
fall outside of my area of expertise, I don't have time right now to review
them because I would first need to understand Shared Work Optimizer first,
which is non-trivial.

I have nonetheless approved the blocked workflows (for first time
contributors some need a committer to run them), I have also noticed
that HIVE-27006 has failing tests, so in the meantime those failures could
be addressed.

Another action that will probably get you closer to having the PRs in is to
address (some of) the code smells/issues that Sonar has identified (from a
cursory look there were some unused imports etc.), the neater the PR the
lesser the time a reviewer will need, the higher the chances they get
reviewed.

Best regards,
Alessandro

On Tue, 14 Feb 2023 at 15:06, Sungwoo Park  wrote:

> Seonggon created three JIRAs a while ago which affect the result of TPC-DS
> queries,
> and I wonder if anyone would have time for reviewing the pull requests.
>
> HIVE-26968: SharedWorkOptimizer merges TableScan operators that have
> different DPP parents
> HIVE-26986: A DAG created by OperatorGraph is not equal to the Tez DAG.
> HIVE-27006: ParallelEdgeFixer inserts misconfigured operator and does not
> connect it in Tez DAG
>
> In the current build, TPC-DS query 64 returns wrong results (no rows) on
> Iceberg tables.
> This is fixed in HIVE-26968.
>
> TPC-DS query 71 fails with an error ("cannot find _col0 from []").
> This is fixed in HIVE-26986.
>
> HIVE-27006 fixes a bug which we found while testing with TPC-DS queries.
> (It depends on HIVE-26986.)
>
> I hope these JIRAs are merged to the master branch before the release of
> Hive 4.0.0.
> Considering the maturity of Hive and the impending release of Hive 4.0.0,
> it does not seem like a good plan to release Hive 4.0.0 that fails on some
> TPC-DS queries.
>
> Thanks!
>
> Sungwoo Park
>


Re: [ANNOUNCE] New committer for Apache Hive: Alessandro Solimando

2023-02-10 Thread Alessandro Solimando
Thanks everyone for your kind messages, I am truly honored to be part of
this community and I am extremely happy to have the chance to work with you
all.

Best regards,
Alessandro

On Thu, 9 Feb 2023 at 08:17, Krisztian Kasa 
wrote:

> Congratulations Alessandro!
>
> On Thu, Feb 9, 2023 at 7:51 AM Akshat m  wrote:
>
> > Congratulations Alessandro :)
> >
> > Regards,
> > Akshat Mathur
> >
> > On Thu, Feb 9, 2023 at 12:11 PM Mahesh Raju Somalaraju <
> > maheshra...@cloudera.com> wrote:
> >
> > > Congratulations Alessandro !!
> > >
> > > -Mahesh Raju S
> > >
> > > On Thu, Feb 9, 2023 at 1:31 AM Naveen Gangam 
> > wrote:
> > >
> > >> The Project Management Committee (PMC) for Apache Hive has invited
> > >> Alessandro Solimando (asolimando) to become a committer and is pleased
> > >> to announce that he has accepted.
> > >>
> > >> Contributions from Alessandro:
> > >> He has authored 30 patches for Hive, 18 for Apache Calcite and has
> > >> done many code reviews for other contributors. Vast experience and
> > >> knowledge in SQL Compiler and Optimization. His most recent work was
> > >> added support for histogram-based column stats in Hive.
> > >>
> > >> https://issues.apache.org/jira/issues/?filter=12352498
> > >>
> > >> Being a committer enables easier contribution to the project since
> > >> there is no need to go via the patch submission process. This should
> > >> enable better productivity.A PMC member helps manage and guide the
> > >> direction of the project.
> > >>
> > >> Congratulations
> > >> Hive PMC
> > >>
> > >
> >
>


Re: [ANNOUNCE] New committer for Apache Hive: Laszlo Vegh

2023-02-07 Thread Alessandro Solimando
Congrats, Laszlo!

Best regards,
Alessandro

On Tue, 7 Feb 2023 at 13:24, Naveen Gangam  wrote:

> The Project Management Committee (PMC) for Apache Hive has invited Laszlo
> Vegh (veghlaci05) to become a committer and we are pleased
> to announce that he has accepted.
>
> Contributions from Laszlo:
>
> He has authored 25 patches. Significant contributions to stabilization of
> ACID compaction. Helped review other patches as well.
>
>
> https://github.com/apache/hive/pulls?q=is%3Amerged+is%3Apr+author%3Aveghlaci05
>
> Being a committer enables easier contribution to the project since there
> is no need to go via the patch submission process. This should enable
> better productivity.A PMC member helps manage and guide the direction of
> the project.
>
> Congratulations
> Hive PMC
>


[jira] [Created] (HIVE-27000) Improve the modularity of the *ColumnStatsMerger classes

2023-01-30 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-27000:
---

 Summary: Improve the modularity of the *ColumnStatsMerger classes
 Key: HIVE-27000
 URL: https://issues.apache.org/jira/browse/HIVE-27000
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


*ColumnStatsMerger classes contain a lot of duplicate code which is not 
specific to the data type, and that could therefore be lifted to a common 
parent class.

This phenomenon is bound to become even worse if we keep enriching further our 
supported set of statistics as we did in the context of HIVE-26221.

The current ticket aims at improving the modularity and code reuse of the 
*ColumnStatsMerger classes, while improving unit-test coverage to cover all 
classes and support more use-cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [ANNOUNCE] New PMC Member: Krisztian Kasa

2023-01-30 Thread Alessandro Solimando
Congratulations Krisztian, very well deserved! :)

On Mon, 30 Jan 2023 at 17:34, László Bodor 
wrote:

> Yay! Very well deserved. Krisztian has a broad knowledge of Hive and an
> extremely deep level of experience with the compiler itself (which is a
> huge beast we all know), looking forward to seeing further contributions!
>
> Naveen Gangam  ezt írta (időpont: 2023.
> jan. 30., H, 17:23):
>
>> Hello Hive Community,
>> Apache Hive PMC is pleased to announce that Krisztian Kasa (username:
>> krisztiankasa) has accepted the Apache Hive PMC's invitation to become PMC
>> Member, and is now our newest PMC member. Please join me in congratulating
>> Krisztian !!!
>>
>> He has been an active member in the hive community across many aspects of
>> the project. Many thanks to Krisztian for all the contributions he has
>> made
>> and looking forward to many more future contributions in the expanded
>> role.
>>
>> https://github.com/apache/hive/commits?author=kasakrisz
>>
>> * 162 commits in master
>> * 124 reviews in master
>> * Reported 159 JIRAS
>>
>> Cheers,
>> Naveen (on behalf of Hive PMC)
>>
>


Re: [ANNOUNCE] New PMC Member: Laszlo Bodor

2023-01-27 Thread Alessandro Solimando
Congratulations Laszlo, very well deserved, thanks for all your hard work.

Best regards,
Alessandro

On Fri, 27 Jan 2023 at 22:33, Naveen Gangam 
wrote:

> Hello Hive Community,
> Apache Hive PMC is pleased to announce that Laszlo Bodor
> (username:abstractdog) has accepted the Apache Hive PMC's invitation to
> become PMC Member, and is now our newest PMC member. Please join me in
> congratulating Laszlo !!!
>
> He has been an active member in the hive community across many aspects of
> the project. Many thanks to Laszlo for all the contributions he has made
> and looking forward to many more future contributions in the expanded role.
>
> https://github.com/apache/hive/commits?author=abstractdog
>
> * 96 commits in master [2]
> * 66 reviews in master [3]
> * Reported 163 JIRAS [6]
>
> Cheers,
> Naveen (on behalf of Hive PMC)
>


Re: several JavaCC warnings "Choice conflict involving two expansions"

2023-01-27 Thread Alessandro Solimando
My apologies, Hive ML wasn't obviously the right target!

Best regards,
Alessandro

On Fri, 27 Jan 2023 at 18:35, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hello everyone,
> while checking CI logs I have noticed that we have lots of JavaCC warnings
> related to ambiguous prefixes in the productions of one of our grammars.
>
> They also seem related to time functions, for which I have seen several
> related developments for BigQuery lately.
>
> Have we verified that our grammar is still behaving properly under this
> situation? Have we considered increasing the lookahead value as suggested?
> Shall we open a Jira ticket to have a closer look?
>
> Here is an example of CI logs showing the problem (although it is
> reproducible locally):
> https://ci-builds.apache.org/job/Calcite/job/Calcite-sonar/job/main/18/consoleFull
>
>
> In what follows the extract that is relevant to the discussion at hand:
>
>> > Task :core:javaCCMain
>> Java Compiler Compiler Version 4.0 (Parser Generator)
>> (type "javacc" with no arguments for help)
>> Reading from file
>> /home/jenkins/jenkins-agent/workspace/Calcite_Calcite-sonar_main/core/build/fmpp/fmppMain/javacc/Parser.jj
>> . . .
>> Warning: Output directory
>> "/home/jenkins/jenkins-agent/workspace/Calcite_Calcite-sonar_main/core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl"
>> does not exist. Creating the directory.
>> Note: UNICODE_INPUT option is specified. Please make sure you create the
>> parser/lexer using a Reader with the correct character encoding.
>> Warning: Choice conflict involving two expansions at
>>  line 4930, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "MICROSECOND"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4931, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "MILLISECOND"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4936, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "DOW"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4937, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "DOY"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4938, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "ISODOW"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4939, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "ISOYEAR"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4940, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "WEEK"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4950, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "QUARTER"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4952, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "EPOCH"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4953, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "DECADE"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4954, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "CENTURY"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 4955, column 5 and line 4956, column 5 respectively.
>>  A common prefix is: "MILLENNIUM"
>>  Consider using a lookahead of 2 for earlier expansion.
>> Warning: Choice conflict involving two expansions at
>>  line 6549, column 9 and line 6551, column 9 respectively.
>>  A common prefix is: "WEEK" "("
>>  Consider using a lookahead of 3 or more for earlier expansion.
>> File "TokenMgrError.java" does not exist.  Will create one.
>> File "ParseException.java" does not exist.  Will create one.
>> File "Token.java" does not exist.  Will create one.
>> File "SimpleCharStream.java" does not exist.  Will create one.
>> Parser generated with 0 errors and 14 warnings.
>
>
> Best regards,
> Alessandro
>


several JavaCC warnings "Choice conflict involving two expansions"

2023-01-27 Thread Alessandro Solimando
Hello everyone,
while checking CI logs I have noticed that we have lots of JavaCC warnings
related to ambiguous prefixes in the productions of one of our grammars.

They also seem related to time functions, for which I have seen several
related developments for BigQuery lately.

Have we verified that our grammar is still behaving properly under this
situation? Have we considered increasing the lookahead value as suggested?
Shall we open a Jira ticket to have a closer look?

Here is an example of CI logs showing the problem (although it is
reproducible locally):
https://ci-builds.apache.org/job/Calcite/job/Calcite-sonar/job/main/18/consoleFull


In what follows the extract that is relevant to the discussion at hand:

> > Task :core:javaCCMain
> Java Compiler Compiler Version 4.0 (Parser Generator)
> (type "javacc" with no arguments for help)
> Reading from file
> /home/jenkins/jenkins-agent/workspace/Calcite_Calcite-sonar_main/core/build/fmpp/fmppMain/javacc/Parser.jj
> . . .
> Warning: Output directory
> "/home/jenkins/jenkins-agent/workspace/Calcite_Calcite-sonar_main/core/build/javacc/javaCCMain/org/apache/calcite/sql/parser/impl"
> does not exist. Creating the directory.
> Note: UNICODE_INPUT option is specified. Please make sure you create the
> parser/lexer using a Reader with the correct character encoding.
> Warning: Choice conflict involving two expansions at
>  line 4930, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "MICROSECOND"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4931, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "MILLISECOND"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4936, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "DOW"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4937, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "DOY"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4938, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "ISODOW"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4939, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "ISOYEAR"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4940, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "WEEK"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4950, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "QUARTER"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4952, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "EPOCH"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4953, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "DECADE"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4954, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "CENTURY"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 4955, column 5 and line 4956, column 5 respectively.
>  A common prefix is: "MILLENNIUM"
>  Consider using a lookahead of 2 for earlier expansion.
> Warning: Choice conflict involving two expansions at
>  line 6549, column 9 and line 6551, column 9 respectively.
>  A common prefix is: "WEEK" "("
>  Consider using a lookahead of 3 or more for earlier expansion.
> File "TokenMgrError.java" does not exist.  Will create one.
> File "ParseException.java" does not exist.  Will create one.
> File "Token.java" does not exist.  Will create one.
> File "SimpleCharStream.java" does not exist.  Will create one.
> Parser generated with 0 errors and 14 warnings.


Best regards,
Alessandro


Re: [EXTERNAL] [ANNOUNCE] New PMC Member: Stamatis Zampetakis

2023-01-13 Thread Alessandro Solimando
Congratulations Stamatis,
very well deserved!

Best regards,
Alessandro

On Fri 13 Jan 2023, 19:57 Chris Nauroth,  wrote:

> Congratulations, Stamatis!
>
> Chris Nauroth
>
>
> On Fri, Jan 13, 2023 at 10:46 AM Simhadri G  wrote:
>
> > Congratulations Stamatis!
> >
> > On Sat, 14 Jan 2023, 00:12 Sankar Hariappan via user, <
> > u...@hive.apache.org>
> > wrote:
> >
> > > Congrats Stamatis! Well deserved one 
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Sankar
> > >
> > >
> > >
> > > *From:* Naveen Gangam 
> > > *Sent:* Saturday, January 14, 2023 12:03 AM
> > > *To:* dev ; u...@hive.apache.org
> > > *Cc:* zabe...@apache.org
> > > *Subject:* [EXTERNAL] [ANNOUNCE] New PMC Member: Stamatis Zampetakis
> > >
> > >
> > >
> > > Hello Hive Community,
> > >
> > > Apache Hive PMC is pleased to announce that Stamatis Zampetakis has
> > > accepted the Apache Hive PMC's invitation to become PMC Member, and is
> > now
> > > our newest PMC member. Please join me in congratulating Stamatis !!!
> > >
> > >
> > >
> > > He has been an active member in the hive community across many aspects
> of
> > > the project. Many thanks to Stamatis for all the contributions he has
> > made
> > > and looking forward to many more future contributions in the expanded
> > role.
> > >
> > >
> > >
> > > Cheers,
> > >
> > > Naveen (on behalf of Hive PMC)
> > >
> >
>


[jira] [Created] (HIVE-26830) Update TPCDS30TB metastore dump with histograms

2022-12-09 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26830:
---

 Summary: Update TPCDS30TB metastore dump with histograms
 Key: HIVE-26830
 URL: https://issues.apache.org/jira/browse/HIVE-26830
 Project: Hive
  Issue Type: Improvement
  Components: Test
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


Once histogram statistics are added, we should re-create 30TB TPCDS setup, 
compute statistics and update the metastore dump.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26828) Fix OOM for hybridgrace_hashjoin_2.q

2022-12-09 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26828:
---

 Summary: Fix OOM for hybridgrace_hashjoin_2.q
 Key: HIVE-26828
 URL: https://issues.apache.org/jira/browse/HIVE-26828
 Project: Hive
  Issue Type: Bug
  Components: Test, Tez
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


_hybridgrace_hashjoin_2.q_ test was disabled because it was failing with OOM 
transiently (from [flaky_test 
output|http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests/],
 in case it disappears):
{noformat}
property: qfile used as override with val: hybridgrace_hashjoin_2.qproperty: 
run_disabled used as override with val: falseSetting hive-site: 
file:/home/jenkins/agent/workspace/hive-flaky-check/data/conf/tez//hive-site.xmlInitializing
 the schema to: 4.0.0Metastore connection URL:  
jdbc:derby:memory:junit_metastore_db;create=trueMetastore connection Driver :   
org.apache.derby.jdbc.EmbeddedDriverMetastore connection User:  APPMetastore 
connection Password:   mineStarting metastore schema initialization to 
4.0.0Initialization script hive-schema-4.0.0.derby.sqlInitialization script 
completedRunning: diff -a 
/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/qfile-results/clientpositive/hybridgrace_hashjoin_2.q.out
 
/home/jenkins/agent/workspace/hive-flaky-check/ql/src/test/results/clientpositive/tez/hybridgrace_hashjoin_2.q.out1954,1999d1953<
 Status: Failed< Vertex failed, vertexName=Map 2, vertexId=vertex_#ID#, 
diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml<
  A masked pattern was here < Caused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< 
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< 
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork 
(org.apache.hadoop.hive.ql.plan.MapWork)<  A masked pattern was here < 
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded<  A 
masked pattern was here < ]< [Masked Vertex killed due to 
OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< 
[Masked Vertex killed due to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due 
to OTHER_VERTEX_FAILURE]< [Masked Vertex killed due to OTHER_VERTEX_FAILURE]< 
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 2, 
vertexId=vertex_#ID#, diagnostics=[Vertex vertex_#ID# [Map 2] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: z1 initializer failed, 
vertex=vertex_#ID# [Map 2], java.lang.RuntimeException: Failed to load plan: 
hdfs://localhost:45033/home/jenkins/agent/workspace/hive-flaky-check/itests/qtest/target/tmp/scratchdir/jenkins/88f705a8-2d67-4d0a-92fd-d9617faf4e46/hive_2022-12-08_02-25-15_569_4666093830564098399-1/jenkins/_tez_scratch_dir/5b786380-b362-45e0-ac10-0f835ef1d8d7/map.xml<
  A masked pattern was here < Caused by: 
org.apache.hive.com.esotericsoftware.kryo.KryoException: 
java.lang.OutOfMemoryError: GC overhead limit exceeded< Serialization trace:< 
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorFilterOperator)< 
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)< aliasToWork 
(org.apache.hadoop.hive.ql.plan.MapWork)<  A masked pattern was here < 
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded<  A 
masked pattern was here < ][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE][Masked 
Vertex killed due to OTHER_VERTEX_FAILURE][Masked Vertex killed due to 
OTHER_VERTEX_FAILURE][Masked Vertex killed due to OTHER_VERTEX_FAILURE]DAG did 
not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:5< PREHOOK: 
query: SELECT COUNT(*)< FROM src1 x< JOIN srcpart z1 ON (x.key = z1.key)< JOIN 
src y1 ON (x.key = y1.key)< JOIN srcpart z2 ON (x.value = z2.value)< JOIN 
src y2 ON (x.value = y2.value)< WHERE z1.key < '' AND z2.key < 
'zz'<  AND y1.value < '' AND y2.value < 'zz'< PREHOOK: 
type: QUERY< PREHOOK: Input: default@src< PREHOOK: Input: default@src1< 
PREH

[jira] [Created] (HIVE-26820) Disable hybridgrace_hashjoin_2 flaky qtest

2022-12-08 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26820:
---

 Summary: Disable hybridgrace_hashjoin_2 flaky qtest
 Key: HIVE-26820
 URL: https://issues.apache.org/jira/browse/HIVE-26820
 Project: Hive
  Issue Type: Test
  Components: Test
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Had this test failing many times in the last months, let's disable it for the 
moment:

[http://ci.hive.apache.org/blue/organizations/jenkins/hive-flaky-check/detail/hive-flaky-check/597/tests]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26812) hive-it-util module misses a dependency on hive-jdbc

2022-12-06 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26812:
---

 Summary: hive-it-util module misses a dependency on hive-jdbc
 Key: HIVE-26812
 URL: https://issues.apache.org/jira/browse/HIVE-26812
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Building from $hive/itests fails as follows:
{noformat}
[INFO] Hive Integration - Testing Utilities ... FAILURE [  6.492 s]
...
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  56.499 s
[INFO] Finished at: 2022-12-06T19:24:16+01:00
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) 
on project hive-it-util: Compilation failure
[ERROR] 
/Users/asolimando/git/hive/itests/util/src/main/java/org/apache/hive/jdbc/miniHS2/MiniHS2.java:[51,28]
 cannot find symbol
[ERROR]   symbol:   class Utils
[ERROR]   location: package org.apache.hive.jdbc
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-it-util{noformat}
Surprisingly, building from the top directory with -Pitests does not fail.

There is a missing dependency on the hive-jdbc module, when adding that, the 
error gets fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26810) Replace HiveFilterSetOpTransposeRule onMatch method with Calcite's built-in implementation

2022-12-05 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26810:
---

 Summary: Replace HiveFilterSetOpTransposeRule onMatch method with 
Calcite's built-in implementation
 Key: HIVE-26810
 URL: https://issues.apache.org/jira/browse/HIVE-26810
 Project: Hive
  Issue Type: Task
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


After HIVE-26762, the _onMatch_ method is now the same as in the Calcite 
implementation, we can drop the Hive's override in order to avoid the risk of 
them drifting away again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Help] How to create a new table when automatically generating schema?

2022-11-30 Thread Alessandro Solimando
Hi Jiajun,
how are you running the schematool?

"schematool -verbose -dbType derby -initSchema" <-- are you maybe missing
the "-initSchema" bit?

Can you provide the exact list of commands you are using?

I am not very familiar with Hive 1.x, but maybe we can get it working.

Best regards,
Alessandro

On Wed, 30 Nov 2022 at 09:23, Jiajun Xie  wrote:

> Hello~
>   I need to create a new table in the metastore.
>
>   I tried to update `metastore/src/model/package.jdo` and
> `metastore/scripts/upgrade/derby/hive-schema-1.2.0.derby.sql`. (My feature
> is based on branch-1.2)
>   Then I set
> `datanucleus.schema.autoCreateTables`,
> `datanucleus.schema.generateDatabase.createScript`.
> None of them work.
>
> How to create a new table when automatically generating schema? Thank you
> very much.
>


[jira] [Created] (HIVE-26772) Add support for specific column statistics to ANALYZE TABLE command

2022-11-23 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26772:
---

 Summary: Add support for specific column statistics to ANALYZE 
TABLE command
 Key: HIVE-26772
 URL: https://issues.apache.org/jira/browse/HIVE-26772
 Project: Hive
  Issue Type: Improvement
  Components: Parser, Statistics
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


Currently column statistics for table/partitions can be computed in an 
all-or-nothing fashion by the "ANALYZE TABLE $tableName COMPUTE STATISTICS FOR 
COLUMNS" command.

We propose to improve granularity and support the request to compute a single 
kind of statistics, BIT_VECTOR for instance.

The new syntax could be "ANALYZE TABLE $tableName COMPUTE $statsKind STATISTICS 
FOR COLUMNS".

In the case of BIT_VECTOR this would translate into "ANALYZE TABLE $tableName 
COMPUTE BIT_VECTOR STATISTICS FOR COLUMNS".

The change would require to extend the parser and the statistics computation 
logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26762) Remove operand pruning in HiveFilterSetOpTransposeRule

2022-11-18 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26762:
---

 Summary: Remove operand pruning in HiveFilterSetOpTransposeRule
 Key: HIVE-26762
 URL: https://issues.apache.org/jira/browse/HIVE-26762
 Project: Hive
  Issue Type: Bug
  Components: CBO, Query Planning
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


HiveFilterSetOpTransposeRule, when applied to UNION ALL operands, checks if the 
newly pushed filter simplifies to FALSE (possibly due to the predicates holding 
on the input).

If this is true and there is more than one UNION ALL operand, it gets pruned.

After HIVE-26524 ("Use Calcite to remove sections of a query plan known never 
produces rows"), this is possibly redundant and we could drop this feature and 
let the other rules take care of the pruning.

In such a case, it's even possible to drop the Hive specific rule and relies on 
the Calcite one (the difference is just the operand pruning at the moment of 
writing), similarly to what HIVE-26642 did for HiveReduceExpressionRule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Jira Public Signup Disabled

2022-11-15 Thread Alessandro Solimando
+1 from me too, thanks Stamatis

On Tue, 15 Nov 2022 at 13:57, Denys Kuzmenko  wrote:

> Hi Stamatis,
>
> Thanks for bringing it up! +1 to implement the same process
>


Re: hive 3.1.3 debug problem

2022-11-14 Thread Alessandro Solimando
Hi Jim,
you need to enrich metastore classes with datanucleus as follows:

$ cd standalone-metastore/metastore-server && mvn datanucleus:enhance

Best regards,
Alessandro

On Sun, 13 Nov 2022 at 23:19, Jim Hopper  wrote:

> Hi,
>
> I try to debug one of the test classes in Intellij under linux.
> I have my metastore setup with mysql and I can build hive 3.1.3
> I build hive using: mvn clean install -DskipTests -P thriftif
> -Dthrift.home=/use/local
> but when I run debug in Intellij I get this error:
>
> javax.jdo.JDOException: Exception thrown when executing query : SELECT
> 'org.apache.hadoop.hive.metastore.model.MDatabase' AS
>
> `NUCLEUS_TYPE`,`A0`.`CTLG_NAME`,`A0`.`DESC`,`A0`.`DB_LOCATION_URI`,`A0`.`NAME`,`A0`.`OWNER_NAME`,`A0`.`OWNER_TYPE`,`A0`.`DB_ID`
> FROM `DBS` `A0` WHERE `A0`.`NAME` = ''
> at
>
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:677)
> ~[datanucleus-api-jdo-4.2.4.jar:?]
> at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391)
> ~[datanucleus-api-jdo-4.2.4.jar:?]
> at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
> ~[datanucleus-api-jdo-4.2.4.jar:?]
>
> is there any other step I miss?
>


[jira] [Created] (HIVE-26733) HiveRelMdPredicates::getPredicate(Project) should return IS_NULL for CAST(NULL)

2022-11-13 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26733:
---

 Summary: HiveRelMdPredicates::getPredicate(Project) should return 
IS_NULL for CAST(NULL)
 Key: HIVE-26733
 URL: https://issues.apache.org/jira/browse/HIVE-26733
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-1
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Given a _CAST(NULL as $type)_ as i-th project expression, the method returns 
_(=($i, CAST(null:NULL):$type)_ instead of _IS_NULL($i)_ as in the case of a 
_NULL_ literal project expression.

This is because _RexLiteral::isNullLiteral_ is used 
[here|https://github.com/apache/hive/blob/a6c0229f910972e84ba558e728532ffc245cc10d/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/HiveRelMdPredicates.java#L153],
 while in similar places, it's often convenient to use 
{_}RexUtil::isNullLiteral(RexNode, boolean allowCast){_}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 1

2022-11-11 Thread Alessandro Solimando
+1 (non-binding)

- verified gpg signature: OK
$ gpg --verify apache-hive-4.0.0-alpha-2-bin.tar.gz.asc
apache-hive-4.0.0-alpha-2-bin.tar.gz
gpg: Signature made Mon  7 Nov 19:04:05 2022 CET
gpg:using RSA key 50606DE1BDBD5CF862A595A907C5682DAFC73125
gpg:issuer "dkuzme...@apache.org"
gpg: Good signature from "Denys Kuzmenko (CODE SIGNING KEY) <
dkuzme...@apache.org>" [unknown]
gpg: WARNING: The key's User ID is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 5060 6DE1 BDBD 5CF8 62A5  95A9 07C5 682D AFC7 3125

$ gpg --verify apache-hive-4.0.0-alpha-2-src.tar.gz.asc
apache-hive-4.0.0-alpha-2-src.tar.gz
gpg: Signature made Mon  7 Nov 19:04:25 2022 CET
gpg:using RSA key 50606DE1BDBD5CF862A595A907C5682DAFC73125
gpg:issuer "dkuzme...@apache.org"
gpg: Good signature from "Denys Kuzmenko (CODE SIGNING KEY) <
dkuzme...@apache.org>" [unknown]
gpg: WARNING: The key's User ID is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 5060 6DE1 BDBD 5CF8 62A5  95A9 07C5 682D AFC7 3125

- verified package checksum: OK
$ diff <(cat apache-hive-4.0.0-alpha-2-src.tar.gz.sha256) <(shasum -a 256
apache-hive-4.0.0-alpha-2-src.tar.gz)
$ diff <(cat apache-hive-4.0.0-alpha-2-bin.tar.gz.sha256) <(shasum -a 256
apache-hive-4.0.0-alpha-2-bin.tar.gz)

- build with “mvn clean install -Piceberg -DskipTests” (from both the
branch and the src folder): OK
- checked release notes: OK
- checked few modules in Nexus: OK
- checking difference in folder: OK
$ diff -qr . ~/git/hive
(nothing worth mentioning)

- environment used:
$ sw_vers
ProductName: macOS
ProductVersion: 11.6.8
BuildVersion: 20G730

$ mvn --version
Apache Maven 3.8.1 (05c21c65bdfed0f71a2f2ada8b84da59348c4c5d)
Maven home: .../.sdkman/candidates/maven/current
Java version: 1.8.0_292, vendor: AdoptOpenJDK, runtime:
.../.sdkman/candidates/java/8.0.292.hs-adpt/jre
Default locale: en_IE, platform encoding: UTF-8
OS name: "mac os x", version: "10.16", arch: "x86_64", family: "mac"

$ java -version
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_292-b10)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.292-b10, mixed mode)


Testing in hive-dev-box (https://github.com/kgyrtkirk/hive-dev-box): OK

This is the setup I have used:
$ sw hadoop 3.3.1
$ sw tez 0.10.2
$ sw hive
https://people.apache.org/~dkuzmenko/release-4.0.0-alpha-2-rc1/apache-hive-4.0.0-alpha-2-bin.tar.gz

Tests including select, join, groupby, orderby, explain (ast, cbo, cbo
cost, vectorization) are working correctly, against data in ORC and parquet
file format.

Thanks Denys for this effort!

Best regards,
Alessandro

On Fri, 11 Nov 2022 at 21:18, Chris Nauroth  wrote:

> +1 (non-binding)
>
> * Verified all checksums.
> * Verified all signatures.
> * Built from source.
> * mvn clean install -Piceberg -DskipTests
> * Tests passed.
> * mvn --fail-never clean verify -Piceberg -Pitests
> -Dmaven.test.jvm.args='-Xmx2048m -DJETTY_AVAILABLE_PROCESSORS=4'
>
> BTW, gentle reminder: if someone would consider reviewing HIVE-26677, then
> it would simplify verification on high-core machines, so that we wouldn't
> need to remember to set -DJETTY_AVAILABLE_PROCESSORS.
>
> https://github.com/apache/hive/pull/3713
>
> Denys, thank you for driving the release.
>
> Chris Nauroth
>
>
> On Fri, Nov 11, 2022 at 9:33 AM Stamatis Zampetakis 
> wrote:
>
> > +1 (non-binding)
> >
> > Ubuntu 20.04.5 LTS, java version "1.8.0_261", Apache Maven 3.6.3
> >
> > * Verified signatures and checksums OK
> > * Checked diff between git repo and release sources (diff -qr hive-git
> > hive-src) OK
> > * Checked LICENSE, NOTICE, and README.md file OK
> > * Built from release sources (mvn clean install -DskipTests -Pitests) OK
> > * Package binaries from release sources (mvn clean package -DskipTests)
> OK
> > * Built from git tag (mvn clean install -DskipTests -Pitests) OK
> > * Run smoke tests on pseudo cluster using hive-dev-box [1] OK
> > * Spot check maven artifacts for general structure, LICENSE, NOTICE,
> > META-INF content OK
> >
> > Smoke tests included: * Derby metastore initialization * simple CREATE
> > TABLE statements (TPCH Orders, Lineitem tables); * basic LOAD FROM LOCAL
> > statements; * basic SELECT statements with simple INNER JOIN, WHERE, and
> > GROUP BY variations; * EXPLAIN statement variations; * ANALYZE TABLE
> > variations;
> >
> > While checking some of the maven artifacts I noticed
> > that hive-exec-4.0.0-alpha-2.jar had two NOTICE files under META-INF
> > (NOTICE.txt for Apache Commons Lang). Not blocking but maybe we should
> > check/fix this in the next release.
> >
> > Best,
> > Stamatis
> >
> > [1] https://lists.apache.org/thread/7yqs7o6ncpottqx8txt0dtt9858ypsbb
> >
> > On Fri, Nov 11, 2022 at 5:51 PM Naveen Gangam
>  > >
> > wrote:
> >
> > > 

[jira] [Created] (HIVE-26722) HiveFilterSetOpTransposeRule incorrectly prune UNION ALL operands

2022-11-10 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26722:
---

 Summary: HiveFilterSetOpTransposeRule incorrectly prune UNION ALL 
operands
 Key: HIVE-26722
 URL: https://issues.apache.org/jira/browse/HIVE-26722
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-1
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Consider the following query:

 
{code:java}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);

CREATE EXTERNAL TABLE t (a string, b string);
INSERT INTO t VALUES ('1000', 'b1');
INSERT INTO t VALUES ('2000', 'b2');

SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000;EXPLAIN CBO
SELECT * FROM (
  SELECT
   a,
   b
  FROM t
   UNION ALL
  SELECT
   a,
   CAST(NULL AS string)
   FROM t) AS t2
WHERE a = 1000; {code}
 

 

The expected result is:

 
{code:java}
1000    b1
1000    NULL{code}
 

An example of correct plan is as follows:

 
{noformat}
CBO PLAN:
HiveUnion(all=[true])
  HiveProject(a=[$0], b=[$1])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t])
  HiveProject(a=[$0], _o__c1=[null:VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE"])
    HiveFilter(condition=[=(CAST($0):DOUBLE, 1000)])
      HiveTableScan(table=[[default, t]], table:alias=[t]){noformat}
 

 

Consider now a scenario where expression reduction in projections is disabled 
by setting the following property{_}:{_}
{noformat}
set hive.cbo.rule.exclusion.regex=ReduceExpressionsRule\(Project\);
{noformat}
In this case, the simplification of _CAST(NULL)_ into _NULL_ does not happen, 
and we get the following (invalid) result:
1000    b1






 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26692) Check for the expected thrift version before compiling

2022-11-02 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26692:
---

 Summary: Check for the expected thrift version before compiling
 Key: HIVE-26692
 URL: https://issues.apache.org/jira/browse/HIVE-26692
 Project: Hive
  Issue Type: Task
  Components: Thrift API
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


At the moment we don't check for the thrift version before launching thrift, 
the error messages are often cryptic upon mismatches.

An explicit check with a clear error message would be nice, like what parquet 
does: 
[https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml#L247-L268]
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26691) Generate thrift files by default at compilation time

2022-11-02 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26691:
---

 Summary: Generate thrift files by default at compilation time
 Key: HIVE-26691
 URL: https://issues.apache.org/jira/browse/HIVE-26691
 Project: Hive
  Issue Type: Task
  Components: Thrift API
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


Currently Hive does not generate thrift files within the main compilation task 
({_}mvn clean install -DskipTests{_}), but it uses a separate profile ({_}mvn 
clean install -Pthriftif -DskipTests -Dthrift.home=$thrift_path{_}), and 
thrift-generated files are generally committed in VCS.

Other Apache projects like Parquet 
([https://github.com/apache/parquet-mr/blob/master/parquet-thrift/pom.xml)] use 
a different approach, building all thrift files by default in the main 
compilation task.

In general, generated files should not be part of our VCS, only the "source" 
file should be (.thrift files here).

Including generated files in VCS is not only problematic because they are 
verbose and clog PR diffs, but they also generate a lot of conflicts (even when 
the changes over the thrift file can be merged automatically).

The ticket proposes to move the thrift files generation at compile time, remove 
the thrift-generated files from VCS, and add them to the "ignore" list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-27 Thread Alessandro Solimando
You are right Ayush, I got sidetracked by the release notes (* [HIVE-19217]
- Upgrade to Hadoop 3.1.0) and I did not check the versions in the pom
file, apologies for the false alarm but better safe than sorry.

With the right versions in place (Hadoop 3.3.1 and Tez 10.0.2), tests
including select, join, groupby, orderby, explain (ast, cbo, cbo cost,
vectorization) are working correctly, against data in ORC and parquet
file format.

No problem for me either when running TestBeelinePasswordOption locally.

So my vote turns into a +1 (non-binding).

Thanks a lot Denys for pushing the release process forward, sorry again you
all for the oversight!

Best regards,
Alessandro

On Thu, 27 Oct 2022 at 20:03, Ayush Saxena  wrote:

> Hi Alessandro,
> From this:
>
> > $ sw hadoop 3.1.0
> > $ sw tez 0.10.0 (tried also 0.10.1)
>
>
> I guess you are using the wrong versions, The Hadoop version to be used
> should be 3.3.1[1] and the Tez version should be 0.10.2[2]
>
> The error also seems to be coming from Hadoop code
>
> > vertex=vertex_1666888075798_0001_1_00 [Map 1],
> > java.lang.NoSuchMethodError:
> > > org.apache.hadoop.fs.Path.compareTo(Lorg/apache/hadoop/fs/Path;)I
>
>
> The compareTo method in Hadoop was changed in HADOOP-16196, which isn't
> there in Hadoop-3.1.0, it is there post 3.2.1 [3]
>
> Another stuff, TestBeelinePasswordOptions passes for me inside the source
> directory.
>
> [*INFO*] ---
>
> [*INFO*]  T E S T S
>
> [*INFO*] ---
>
> [*INFO*] Running org.apache.hive.beeline.*TestBeelinePasswordOption*
>
> [*INFO*] *Tests run: 10*, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> 18.264 s - in org.apache.hive.beeline.*TestBeelinePasswordOption*
>
> -Ayush
>
> [1]
> https://github.com/apache/hive/blob/release-4.0.0-alpha-2-rc0/pom.xml#L136
> [2]
> https://github.com/apache/hive/blob/release-4.0.0-alpha-2-rc0/pom.xml#L197
> [3] https://issues.apache.org/jira/browse/HADOOP-16196
>
> On Thu, 27 Oct 2022 at 23:15, Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > unfortunately my vote is -1 (although non-binding) due to a classpath
> error
> > which prevents queries involving Tez to complete (all the details at the
> > end of the email, apologies for the lengthy text but I wanted to provide
> > all the context).
> >
> > - verified gpg signature: OK
> >
> > $ wget https://www.apache.org/dist/hive/KEYS
> >
> > $ gpg --import KEYS
> >
> > ...
> >
> > $ gpg --verify apache-hive-4.0.0-alpha-2-bin.tar.gz.asc
> > apache-hive-4.0.0-alpha-2-bin.tar.gz
> >
> > gpg: Signature made Thu 27 Oct 15:11:48 2022 CEST
> >
> > gpg:using RSA key
> 50606DE1BDBD5CF862A595A907C5682DAFC73125
> >
> > gpg:issuer "dkuzme...@apache.org"
> >
> > gpg: Good signature from "Denys Kuzmenko (CODE SIGNING KEY) <
> > dkuzme...@apache.org>" [unknown]
> >
> > gpg: WARNING: The key's User ID is not certified with a trusted
> signature!
> >
> > gpg:  There is no indication that the signature belongs to the
> > owner.
> >
> > Primary key fingerprint: 5060 6DE1 BDBD 5CF8 62A5  95A9 07C5 682D AFC7
> 3125
> >
> > $ gpg --verify apache-hive-4.0.0-alpha-2-src.tar.gz.asc
> > apache-hive-4.0.0-alpha-2-src.tar.gz
> >
> > gpg: Signature made Thu 27 Oct 15:12:08 2022 CEST
> >
> > gpg:using RSA key
> 50606DE1BDBD5CF862A595A907C5682DAFC73125
> >
> > gpg:issuer "dkuzme...@apache.org"
> >
> > gpg: Good signature from "Denys Kuzmenko (CODE SIGNING KEY) <
> > dkuzme...@apache.org>" [unknown]
> >
> > gpg: WARNING: The key's User ID is not certified with a trusted
> signature!
> >
> > gpg:  There is no indication that the signature belongs to the
> > owner.
> >
> > Primary key fingerprint: 5060 6DE1 BDBD 5CF8 62A5  95A9 07C5 682D AFC7
> 3125
> >
> > (AFAIK, this warning is OK)
> >
> > - verified package checksum: OK
> >
> > $ diff <(cat apache-hive-4.0.0-alpha-2-src.tar.gz.sha256) <(shasum -a 256
> > apache-hive-4.0.0-alpha-2-src.tar.gz)
> >
> > $ diff <(cat apache-hive-4.0.0-alpha-2-bin.tar.gz.sha256) <(shasum -a 256
> > apache-hive-4.0.0-alpha-2-bin.tar.gz)
> >
> > - verified maven build (no tests): OK
> >
> > $ mvn clean install -DskipTests
> >
> > ...
> >
> > [INFO]
> > 

Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-27 Thread Alessandro Solimando
t; java.lang.IllegalStateException:
> > Insufficient configured threads: required=4 < max=4 for
> >
> >
> QueuedThreadPool[hiveserver2-web]@628bd77e{STARTED,4<=4<=4,i=4,r=-1,q=0}[ReservedThreadExecutor@cfacf0
> > {s=0/1,p=0}]
> > at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:733)
> > at org.apache.hive.jdbc.miniHS2.MiniHS2.start(MiniHS2.java:395)
> > at
> >
> >
> org.apache.hive.beeline.TestBeelinePasswordOption.preTests(TestBeelinePasswordOption.java:60)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> >
> >
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> > at
> >
> >
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> > at
> >
> >
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> > at
> >
> >
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
> > at
> >
> >
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
> > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> > at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> > at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> > at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> > at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> > at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> > at
> >
> >
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:377)
> > at
> >
> >
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:138)
> > at
> org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:465)
> > at
> > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:451)
> > Caused by: java.lang.IllegalStateException: Insufficient configured
> > threads: required=4 < max=4 for
> >
> >
> QueuedThreadPool[hiveserver2-web]@628bd77e{STARTED,4<=4<=4,i=4,r=-1,q=0}[ReservedThreadExecutor@cfacf0
> > {s=0/1,p=0}]
> > at
> >
> >
> org.eclipse.jetty.util.thread.ThreadPoolBudget.check(ThreadPoolBudget.java:165)
> > at
> >
> >
> org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseTo(ThreadPoolBudget.java:141)
> > at
> >
> >
> org.eclipse.jetty.util.thread.ThreadPoolBudget.leaseFrom(ThreadPoolBudget.java:191)
> > at org.eclipse.jetty.io
> .SelectorManager.doStart(SelectorManager.java:255)
> > at
> >
> >
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
> > at
> >
> >
> org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:169)
> > at
> >
> >
> org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:110)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:321)
> > at
> >
> >
> org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:81)
> > at
> >
> org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:234)
> > at
> >
> >
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
> > at org.eclipse.jetty.server.Server.doStart(Server.java:401)
> > at
> >
> >
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
> > at org.apache.hive.http.HttpServer.start(HttpServer.java:335)
> > at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:729)
> > ... 21 more
> >
> > Chris Nauroth
> >
> >
> > On Thu, Oct 27, 2022 at 7:48 AM Ádám Szita  wrote:
> >
> > > Hi,
> > >
> > > Thanks for rebuilding this RC, Denys.
> > >
> > > Alessandro: IMHO since there was no vote cast yet and we're talking
> about
> > > a build option change only, I guess it just doesn't worth rebuilding
> the
> > > whole stuff from scratch to create a new RC.
> > 

Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-27 Thread Alessandro Solimando
Sorry, I have misread the comment.

If the code hasn't changed and the tag for the RC is still pointing to the
same code, I don't think there is a need for a new RC.

On Thu, 27 Oct 2022 at 15:48, Denys Kuzmenko 
wrote:

> Hi Alessandro,
>
> There were no code changes, just missing artifacts due to an outdated
> release guide (iceberg bits are generated only under iceberg profile).
> Not sure that we should create new RC in that case. Naveen, what
> do you think?
>
>
> On Thu, Oct 27, 2022 at 3:30 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hi Denys,
> > in other Apache communities I generally see that votes are cancelled and
> a
> > new RC is prepared when there are changes or blocking issues like in this
> > case, not sure how things are done in Hive though.
> >
> > Best regards,
> > Alessandro
> >
> > On Thu, 27 Oct 2022 at 15:22, Denys Kuzmenko  > .invalid>
> > wrote:
> >
> > > Hi Adam,
> > >
> > > Thanks for pointing that out! Upstream release guide is outdated. Once
> I
> > > receive the edit rights, I'll amend the instructions.
> > > Updated the release artifacts and checksums:
> > >
> > > Apache Hive 4.0.0-alpha-2 Release Candidate 0 is available
> > > here:https://people.apache.org/~dkuzmenko/release-4.0.0-alpha-2-rc0/
> > >
> > >
> > > The checksums are these:
> > > - b4dbaac5530694f631af13677ffe5443addc148bd94176b27a109a6da67f5e0f
> > > apache-hive-4.0.0-alpha-2-bin.tar.gz
> > > - 8c4639915e9bf649f4a55cd9adb9d266aa15d8fa48ddfadb28ebead2c0aee4d0
> > > apache-hive-4.0.0-alpha-2-src.tar.gz
> > >
> > > Maven artifacts are available
> > > here:
> > > https://repository.apache.org/content/repositories/orgapachehive-1117/
> > >
> > > The tag release-4.0.0-alpha-2-rc0 has been applied to the source for
> > > this release in github, you can see it at
> > > https://github.com/apache/hive/tree/release-4.0.0-alpha-2-rc0
> > >
> > > The git commit hash
> > > is:
> > >
> >
> https://github.com/apache/hive/commit/da146200e003712e324496bf560a1702485d231c
> > >
> > >
> > > Please check again.
> > >
> > >
> > > Thanks,
> > > Denys
> > >
> > > On Thu, Oct 27, 2022 at 2:53 PM Ádám Szita  wrote:
> > >
> > > > Hi Denys,
> > > >
> > > > Unfortutantely I can't give a plus 1 on this yet, as the Iceberg
> > > artifacts
> > > > are missing from the binary tar.gz. Perhaps -Piceberg flag was
> missing
> > > > during build, can you please rebuild?
> > > >
> > > > Thanks,
> > > > Adam
> > > >
> > > > On 2022/10/25 11:20:23 Denys Kuzmenko wrote:
> > > > > Hi team,
> > > > >
> > > > >
> > > > > Apache Hive 4.0.0-alpha-2 Release Candidate 0 is available
> > > > > here:
> https://people.apache.org/~dkuzmenko/release-4.0.0-alpha-2-rc0/
> > > > >
> > > > >
> > > > > The checksums are these:
> > > > > - 7d4c54ecfe2b04cabc283a84defcc1e8a02eed0e13baba2a2c91ae882b6bfaf7
> > > > > apache-hive-4.0.0-alpha-2-bin.tar.gz
> > > > > - 8c4639915e9bf649f4a55cd9adb9d266aa15d8fa48ddfadb28ebead2c0aee4d0
> > > > > apache-hive-4.0.0-alpha-2-src.tar.gz
> > > > >
> > > > > Maven artifacts are available
> > > > > here:
> > > >
> https://repository.apache.org/content/repositories/orgapachehive-1117/
> > > > >
> > > > > The tag release-4.0.0-alpha-2-rc0 has been applied to the source
> for
> > > > > this release in github, you can see it at
> > > > > https://github.com/apache/hive/tree/release-4.0.0-alpha-2-rc0
> > > > >
> > > > > The git commit hash
> > > > > is:
> > > >
> > >
> >
> https://github.com/apache/hive/commit/da146200e003712e324496bf560a1702485d231c
> > > > >
> > > > > Voting will conclude in 72 hours.
> > > > >
> > > > > Hive PMC Members: Please test and vote.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-27 Thread Alessandro Solimando
Hi Denys,
in other Apache communities I generally see that votes are cancelled and a
new RC is prepared when there are changes or blocking issues like in this
case, not sure how things are done in Hive though.

Best regards,
Alessandro

On Thu, 27 Oct 2022 at 15:22, Denys Kuzmenko 
wrote:

> Hi Adam,
>
> Thanks for pointing that out! Upstream release guide is outdated. Once I
> receive the edit rights, I'll amend the instructions.
> Updated the release artifacts and checksums:
>
> Apache Hive 4.0.0-alpha-2 Release Candidate 0 is available
> here:https://people.apache.org/~dkuzmenko/release-4.0.0-alpha-2-rc0/
>
>
> The checksums are these:
> - b4dbaac5530694f631af13677ffe5443addc148bd94176b27a109a6da67f5e0f
> apache-hive-4.0.0-alpha-2-bin.tar.gz
> - 8c4639915e9bf649f4a55cd9adb9d266aa15d8fa48ddfadb28ebead2c0aee4d0
> apache-hive-4.0.0-alpha-2-src.tar.gz
>
> Maven artifacts are available
> here:
> https://repository.apache.org/content/repositories/orgapachehive-1117/
>
> The tag release-4.0.0-alpha-2-rc0 has been applied to the source for
> this release in github, you can see it at
> https://github.com/apache/hive/tree/release-4.0.0-alpha-2-rc0
>
> The git commit hash
> is:
> https://github.com/apache/hive/commit/da146200e003712e324496bf560a1702485d231c
>
>
> Please check again.
>
>
> Thanks,
> Denys
>
> On Thu, Oct 27, 2022 at 2:53 PM Ádám Szita  wrote:
>
> > Hi Denys,
> >
> > Unfortutantely I can't give a plus 1 on this yet, as the Iceberg
> artifacts
> > are missing from the binary tar.gz. Perhaps -Piceberg flag was missing
> > during build, can you please rebuild?
> >
> > Thanks,
> > Adam
> >
> > On 2022/10/25 11:20:23 Denys Kuzmenko wrote:
> > > Hi team,
> > >
> > >
> > > Apache Hive 4.0.0-alpha-2 Release Candidate 0 is available
> > > here:https://people.apache.org/~dkuzmenko/release-4.0.0-alpha-2-rc0/
> > >
> > >
> > > The checksums are these:
> > > - 7d4c54ecfe2b04cabc283a84defcc1e8a02eed0e13baba2a2c91ae882b6bfaf7
> > > apache-hive-4.0.0-alpha-2-bin.tar.gz
> > > - 8c4639915e9bf649f4a55cd9adb9d266aa15d8fa48ddfadb28ebead2c0aee4d0
> > > apache-hive-4.0.0-alpha-2-src.tar.gz
> > >
> > > Maven artifacts are available
> > > here:
> > https://repository.apache.org/content/repositories/orgapachehive-1117/
> > >
> > > The tag release-4.0.0-alpha-2-rc0 has been applied to the source for
> > > this release in github, you can see it at
> > > https://github.com/apache/hive/tree/release-4.0.0-alpha-2-rc0
> > >
> > > The git commit hash
> > > is:
> >
> https://github.com/apache/hive/commit/da146200e003712e324496bf560a1702485d231c
> > >
> > > Voting will conclude in 72 hours.
> > >
> > > Hive PMC Members: Please test and vote.
> > >
> > > Thanks
> > >
> >
>


[jira] [Created] (HIVE-26652) HiveSortPullUpConstantsRule produces an invalid plan when pulling up constants for nullable fields

2022-10-19 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26652:
---

 Summary: HiveSortPullUpConstantsRule produces an invalid plan when 
pulling up constants for nullable fields
 Key: HIVE-26652
 URL: https://issues.apache.org/jira/browse/HIVE-26652
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
 Fix For: 4.0.0-alpha-2


The rule pulls up constants without checking/adjusting nullability to match 
that of the field type.

Here is the stack-trace when a nullable type is involved:
{code:java}
java.lang.AssertionError: type mismatch:
ref:
JavaType(class java.lang.Integer)
input:
JavaType(int) NOT NULL    at 
org.apache.calcite.util.Litmus$1.fail(Litmus.java:31)
    at org.apache.calcite.plan.RelOptUtil.eq(RelOptUtil.java:2167)
    at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125)
    at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:57)
    at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112)
    at org.apache.calcite.rel.core.Project.isValid(Project.java:215)
    at org.apache.calcite.rel.core.Project.(Project.java:94)
    at org.apache.calcite.rel.core.Project.(Project.java:100)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.(HiveProject.java:58)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveProject.copy(HiveProject.java:106)
    at org.apache.calcite.rel.core.Project.copy(Project.java:126)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveSortPullUpConstantsRule$HiveSortPullUpConstantsRuleBase.onMatch(HiveSortPullUpConstantsRule.java:195)
    at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:333)
    at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:542)
    at org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:407)
    at 
org.apache.calcite.plan.hep.HepPlanner.executeInstruction(HepPlanner.java:243)
    at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance.execute(HepInstruction.java:127)
    at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:202)
    at org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:189)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.test(TestHiveSortExchangePullUpConstantsRule.java:104)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.TestHiveSortExchangePullUpConstantsRule.testNullableFields(TestHiveSortExchangePullUpConstantsRule.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.mockito.internal.runners.DefaultInternalRunner$1$1.evaluate(DefaultInternalRunner.java:54)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at 
org.mockito.internal.runners.DefaultInternalRunner$1.run(DefaultInternalRunner.java:99)
    at 
org.mockito.internal.runners.DefaultInternalRunner.run(DefaultInternalRunner.java:105)
    at org.mockito.internal.runners.StrictRunner.run(StrictRunner.java:40)
    at org.mockito.junit.MockitoJUnitRunner.run(MockitoJUnitRunner.java:163)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at 
com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38

[jira] [Created] (HIVE-26643) HiveUnionPullUpConstantsRule fails when pulling up constants over nullable fields

2022-10-17 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26643:
---

 Summary: HiveUnionPullUpConstantsRule fails when pulling up 
constants over nullable fields
 Key: HIVE-26643
 URL: https://issues.apache.org/jira/browse/HIVE-26643
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The rule does pull up constants without checking/adjusting nullability to match 
that of the field type. 

Here is the stack-trace when a nullable type is involved:
{code:java}
java.lang.AssertionError: Cannot add expression of different type to set:
set type is RecordType(JavaType(class java.lang.Integer) f1, JavaType(int) NOT 
NULL f2) NOT NULL
expression type is RecordType(JavaType(int) NOT NULL f1, JavaType(int) NOT NULL 
f2) NOT NULL
set is 
rel#38:HiveUnion.(input#0=HepRelVertex#35,input#1=HepRelVertex#35,all=true)
expression is HiveProject(f1=[1], f2=[$0])
  HiveUnion(all=[true])
HiveProject(f2=[$1])
  HiveProject(f1=[$0], f2=[$1])
HiveFilter(condition=[=($0, 1)])
  LogicalTableScan(table=[[]])
HiveProject(f2=[$1])
  HiveProject(f1=[$0], f2=[$1])
HiveFilter(condition=[=($0, 1)])
  LogicalTableScan(table=[[]])
{code}

The solution is to check nullability and add a cast when the field is nullable, 
since the constant's type is not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26619) Sonar analysis not run on the master branch

2022-10-11 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26619:
---

 Summary: Sonar analysis not run on the master branch
 Key: HIVE-26619
 URL: https://issues.apache.org/jira/browse/HIVE-26619
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The analysis for the master branch was using the wrong variable name 
(_CHANGE_BRANCH_) instead of the branch name (_BRANCH_NAME_).

For an overview of git-related environment variables available in Jenkins, you 
can refer to [https://ci.eclipse.org/webtools/env-vars.html/].

With [~zabetak] we have noticed some spurious files in Sonar analysis for PRs, 
as per this sonar support thread it might be linked to the stale analysis of 
the target branch (master for us): 
[https://community.sonarsource.com/t/unrelated-files-scanned-in-sonarcloud-pr-check/47138/14]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26572) Support constant expressions in vectorized expressions

2022-09-28 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26572:
---

 Summary: Support constant expressions in vectorized expressions
 Key: HIVE-26572
 URL: https://issues.apache.org/jira/browse/HIVE-26572
 Project: Hive
  Issue Type: Improvement
  Components: Vectorization
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


At the moment, we cannot vectorize aggregate expression having constant 
parameters in addition to the aggregation column (it's forbidden 
[here|https://github.com/apache/hive/blob/c19d56ec7429bfcfad92b62ac335dbf8177dab24/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java#L4531]).

One compelling example of how this could help is [PR 
1824|https://github.com/apache/hive/pull/1824], linked to HIVE-24510, where 
_compute_bit_vector_ had to be split into _compute_bit_vector_hll_ + 
_compute_bit_vector_fm_ when HLL implementation has been added, while 
_compute_bit_vector($col, ['HLL'|'FM'])_ could have been used.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Proposal: Revamp Apache Hive website.

2022-09-19 Thread Alessandro Solimando
Hi everyone,
thanks Simhadri for pushing this forward.

I like the look and feel of the new website, and I agree with Stamatis that
having the website sources in the Hive repo, and automatically publishing
the site upon commits would be very beneficial.

Best regards,
Alessandro

On Thu, 15 Sept 2022 at 23:11, Stamatis Zampetakis 
wrote:

> Hi all,
>
> It's great to see some effort in improving the website. The POC from
> Simhadri looks really cool; I didn't check the content but I love the look
> and feel.
>
> Now regarding the current process for modifying and updating the website
> there is some info in this relatively recent thread [1].
>
> Moving forward, I would really like to have the source code of the website
> (markdown etc) in the main repo of the project [2], and use GitHub actions
> to automatically build and push the content to the site repo [3] per commit
> basis.
> This workflow is used in Apache Calcite and I find it extremely convenient.
>
> Best,
> Stamatis
>
> [1] https://lists.apache.org/thread/4b6x4d6z4tgnv4mo0ycg30y4dlt0msbd
> [2] https://github.com/apache/hive
> [3] https://github.com/apache/hive-site
>
> On Thu, Sep 15, 2022 at 10:50 PM Ayush Saxena  wrote:
>
>> Owen,
>> I am not sure if I am catching you right, But now the repository for the
>> website has changed, we no longer use our main *hive.git* repository for
>> the website, We are using the* hive-site *repository for the website,
>> The migration happened this year January I suppose.
>>
>> Can give a check to the set of commit here from: gmcdonald
>>  and
>> Humbedooh 
>> https://github.com/apache/hive-site/commits/main
>>
>> Now whatever you push to main branch of hive-site(
>> https://github.com/apache/hive-site) it gets published on the *asf-site*
>> branch by the buildbot(
>> https://github.com/apache/hive-site/commits/asf-site)
>>
>> Simhadri's changes will be directed to the main branch of the hive-site
>> repo and they will get auto published on the asf-site branch, I tried this
>> a couple of months back and it indeed worked that way. Let me know if we
>> are missing anything on this, I tried to find threads around this but not
>> sure if it is in private@ or so, couldn't find, I will try again and if
>> there is something around that what needs to be done, I will have a word
>> with the Infra folks and get that sorted, if it isn't already.
>>
>> -Ayush
>>
>> On Fri, 16 Sept 2022 at 01:49, Owen O'Malley 
>> wrote:
>>
>>> Look at the threads and talk to Apache Infra. They couldn't make it work
>>> before. We would have needed to manually publish to the asf-site branch.
>>>
>>> On Thu, Sep 15, 2022 at 7:54 PM Simhadri G 
>>> wrote:
>>>
 Thanks Ayush, Pau Tallada and Owen O'Malley for the feedback!

 @Owen , This website revamp indeed replaces the website with markdown
 as you have mentioned. I have referred to your PR for some of the content
 for the site.
 The actual code for the website is here:
 https://github.com/simhadri-g/hive-site/tree/new-site

 Once we add markdown files to the source code under /content/ , hugo
 will rebuild the files and generate the static html files in ./public/
 directory.
 I have copied over these static files to a separate repo and
 temporarily hosted it with gh-pages to start the mail chain.

  For the final site, I am already trying to automate this with github
 actions. So, as soon as any new changes are made to the site branch, the
 github actions will automatically tigger and update the site.

 Thanks!

 On Fri, Sep 16, 2022 at 12:17 AM Owen O'Malley 
 wrote:

> I found it - https://github.com/apache/hive/pull/1410
>
> On Thu, Sep 15, 2022 at 6:42 PM Owen O'Malley 
> wrote:
>
>> I had a PR to replace the website with markdown. Apache Infra was
>> supposed to make it autopublish. *sigh*
>>
>> .. Owen
>>
>> On Thu, Sep 15, 2022 at 4:23 PM Pau Tallada  wrote:
>>
>>> Hi,
>>>
>>> Great work!
>>> +1 on updating it as well
>>>
>>> Missatge de Ayush Saxena  del dia dj., 15 de
>>> set. 2022 a les 17:40:
>>>
 Hi Simhadri,
 Thanx for the initiative, +1 on updating our current website.
 The new website looks way better than the existing one.
 Can create a Jira and link this to that after a couple of days if
 there aren’t any objections to the move, so as people can drop further
 suggestions over there.

 -Ayush

 > On 15-Sep-2022, at 8:33 PM, SG  wrote:
 >
 > Hi Everyone,
 >
 > The existing apache hive website https://hive.apache.org/ hasn't
 been
 > updated for a very long time. Additionally, I was not able to
 build the
 > docker image associated with the 

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-09 Thread Alessandro Solimando
Hi Stamatis,
glad to hear you find Sonar helpful, thanks for providing your feedback.

The master branch analysis already provides what I think you are looking
for, you have:

   - all code analysis (to see the full status of the code):
   https://sonarcloud.io/summary/overall?id=apache_hive
   - new code analysis (basically what changed in the last commit):
   https://sonarcloud.io/summary/new_code?id=apache_hive

For PRs, similarly, the analysis covers the changes w.r.t. the target
branch, it's a good and quick way to ascertain the code quality of the PR.

Regarding "Is it possible to somehow save the current analysis on master
and make the
PR quality gates fail when things become worse?", it is definitely
possible, we can define a success/failure threshold for each of the
metrics, and make it fail if the quality gate criteria are not met.

I was suggesting to postpone this to allow people to get first familiar
with it, I would not want to disrupt existing work, Sonar is a rich tool
and people might need a bit of time to adjust to it.

Good news is that quality gates can be changed directly from SonarCloud and
won't require code changes, we might kick in a feedback discussion after a
month or so from when we introduce Sonar analysis and see what people think.

Best regards,
Alessandro

On Tue, 9 Aug 2022 at 16:38, Stamatis Zampetakis  wrote:

> Hi Alessandro,
>
> Sonar integration will definitely help in improving cope quality and
> preventing bugs so many thanks for pushing this forward.
>
> I went over the PR and it is in good shape. I plan to merge it in the
> following days unless someone objects.
> We can tackle further improvements in follow up JIRAs.
>
> Is it possible to somehow save the current analysis on master and make the
> PR quality gates fail when things become worse?
> If not then what may help in reviewing PRs is to have a diff view (between
> a PR and current master) so we can quickly tell if the PR we are about to
> merge makes things better or worse; as far as I understand the idea is to
> do this manually at the moment by checking the results on master and on the
> PR under review.
>
> Enabling code coverage would be very helpful as well. Looking forward to
> this.
>
> Best,
> Stamatis
>
> On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Errata corrige: the right PR link is the following
> > https://github.com/apache/hive/pull/3254
> >
> > Best regards,
> > Alessandro
> >
> > On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Hi community,
> > > in the context of HIVE-26196
> > > <https://issues.apache.org/jira/browse/HIVE-26196> we started
> > considering
> > > the adoption of SonarCloud <https://sonarcloud.io/features> analysis
> for
> > > Apache Hive to promote data-driven code quality improvements and to
> allow
> > > reviewers to focus on the conceptual part of the changes by helping
> them
> > > spot trivial code smells, security issues and bugs.
> > >
> > > SonarCloud has already been adopted and integrated into a few top
> Apache
> > > projects like DolphinScheduler <https://dolphinscheduler.apache.org/>
> > and Apache
> > > Jackrabbit FileVault <https://jackrabbit.apache.org/filevault/>.
> > >
> > > For those who don't know, Sonar is a code analysis tool, the initial
> > > adoption would aim at tracking code quality for the master branch, and
> > > making the PRs' review process easier, by allowing to compare which
> > > code/security issues a PR solved/introduced with respect to the main
> > branch.
> > >
> > > We already have a Hive-dedicated project under the Apache foundation's
> > > SonarCloud account:
> > https://sonarcloud.io/project/overview?id=apache_hive.
> > >
> > > In what follows I will highlight the main points of interest:
> > >
> > > 1) sonar adoption scope:
> > > For the time being a descriptive approach (just show the analysis and
> > > associated metrics) could be adopted, delaying a prescriptive one
> (i.e.,
> > > quality gates based on the metrics for PRs' mergeability) to a later
> time
> > > where we have tested SonarCloud for long enough to judge that it could
> > be a
> > > sensible move.
> > >
> > > 2) false positives:
> > > Sonar suffers from false positives, but they can be marked as such from
> > > the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)
> > >
> > > How do I get rid of issues that are False

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-08 Thread Alessandro Solimando
Errata corrige: the right PR link is the following
https://github.com/apache/hive/pull/3254

Best regards,
Alessandro

On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi community,
> in the context of HIVE-26196
> <https://issues.apache.org/jira/browse/HIVE-26196> we started considering
> the adoption of SonarCloud <https://sonarcloud.io/features> analysis for
> Apache Hive to promote data-driven code quality improvements and to allow
> reviewers to focus on the conceptual part of the changes by helping them
> spot trivial code smells, security issues and bugs.
>
> SonarCloud has already been adopted and integrated into a few top Apache
> projects like DolphinScheduler <https://dolphinscheduler.apache.org/> and 
> Apache
> Jackrabbit FileVault <https://jackrabbit.apache.org/filevault/>.
>
> For those who don't know, Sonar is a code analysis tool, the initial
> adoption would aim at tracking code quality for the master branch, and
> making the PRs' review process easier, by allowing to compare which
> code/security issues a PR solved/introduced with respect to the main branch.
>
> We already have a Hive-dedicated project under the Apache foundation's
> SonarCloud account: https://sonarcloud.io/project/overview?id=apache_hive.
>
> In what follows I will highlight the main points of interest:
>
> 1) sonar adoption scope:
> For the time being a descriptive approach (just show the analysis and
> associated metrics) could be adopted, delaying a prescriptive one (i.e.,
> quality gates based on the metrics for PRs' mergeability) to a later time
> where we have tested SonarCloud for long enough to judge that it could be a
> sensible move.
>
> 2) false positives:
> Sonar suffers from false positives, but they can be marked as such from
> the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)
>
> How do I get rid of issues that are False-Positives?
>> False-Positive and Won't Fix
>> You can mark individual issues False Positive or Won't Fix through the
>> issues interface. If you're using PR analysis provided by the Developer
>> Edition, issues marked False Positive or Won't Fix will retain that status
>> after merge. This is the preferred approach.
>
>
>> //NOSONAR
>> For most languages, SonarQube supports the use of the generic mechanism:
>> //NOSONAR at the end of the line of the issue. This will suppress all
>> issues - now and in the future - that might be raised on the line.
>
>
> For the time being, I think that marking false positives via the UI is
> more convenient than using "//NOSONAR", but this can be discussed further.
>
> 3) test code coverage:
>
> Due to the specific structure of the ptest infra (split execution and
> other peculiarities), we are not yet supporting test code coverage, this
> can be added at a later stage, in the meantime all the code quality and
> security metrics are available.
>
> 4) what will be analyzed:
>
> the master branch and each open PR
>
> 5) integration with github:
>
> SonarCloud integrates with GitHub in two ways, the first one is an
> additional item in the list of checks (where you have the spell checking,
> CI result etc.) that will just say Passed/Not Passed and provide a link for
> all the details, the second is a "summary" comment under the PR
> highlighting the main info (you can see an example here
> <https://github.com/apache/hive/pull/3254#issuecomment-1206641629>).
>
> The second integration can be disabled if we consider that the first one
> is enough, and that if we want to dig more we can open the associated link
> for the full analysis in SonarCloud.
>
> 6) analysis runtime:
>
> In CI the full analysis takes around 30 minutes, but this step is executed
> in parallel with the test split tasks and won't add to the total runtime.
> For PRs SonarCloud detects unchanged files and avoids analysing them, so
> the runtime there is expected to be lower.
>
> I'd like to hear your thoughts on this, and I am looking for reviewers for
> the PR https://github.com/apache/hive/pull/3339.
>
> Best regards,
> Alessandro
>


[DISCUSS] SonarCloud integration for Apache Hive

2022-08-08 Thread Alessandro Solimando
Hi community,
in the context of HIVE-26196
 we started considering
the adoption of SonarCloud  analysis for
Apache Hive to promote data-driven code quality improvements and to allow
reviewers to focus on the conceptual part of the changes by helping them
spot trivial code smells, security issues and bugs.

SonarCloud has already been adopted and integrated into a few top Apache
projects like DolphinScheduler 
and Apache
Jackrabbit FileVault .

For those who don't know, Sonar is a code analysis tool, the initial
adoption would aim at tracking code quality for the master branch, and
making the PRs' review process easier, by allowing to compare which
code/security issues a PR solved/introduced with respect to the main branch.

We already have a Hive-dedicated project under the Apache foundation's
SonarCloud account: https://sonarcloud.io/project/overview?id=apache_hive.

In what follows I will highlight the main points of interest:

1) sonar adoption scope:
For the time being a descriptive approach (just show the analysis and
associated metrics) could be adopted, delaying a prescriptive one (i.e.,
quality gates based on the metrics for PRs' mergeability) to a later time
where we have tested SonarCloud for long enough to judge that it could be a
sensible move.

2) false positives:
Sonar suffers from false positives, but they can be marked as such from the
web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)

How do I get rid of issues that are False-Positives?
> False-Positive and Won't Fix
> You can mark individual issues False Positive or Won't Fix through the
> issues interface. If you're using PR analysis provided by the Developer
> Edition, issues marked False Positive or Won't Fix will retain that status
> after merge. This is the preferred approach.


> //NOSONAR
> For most languages, SonarQube supports the use of the generic mechanism:
> //NOSONAR at the end of the line of the issue. This will suppress all
> issues - now and in the future - that might be raised on the line.


For the time being, I think that marking false positives via the UI is more
convenient than using "//NOSONAR", but this can be discussed further.

3) test code coverage:

Due to the specific structure of the ptest infra (split execution and other
peculiarities), we are not yet supporting test code coverage, this can be
added at a later stage, in the meantime all the code quality and security
metrics are available.

4) what will be analyzed:

the master branch and each open PR

5) integration with github:

SonarCloud integrates with GitHub in two ways, the first one is an
additional item in the list of checks (where you have the spell checking,
CI result etc.) that will just say Passed/Not Passed and provide a link for
all the details, the second is a "summary" comment under the PR
highlighting the main info (you can see an example here
).

The second integration can be disabled if we consider that the first one is
enough, and that if we want to dig more we can open the associated link for
the full analysis in SonarCloud.

6) analysis runtime:

In CI the full analysis takes around 30 minutes, but this step is executed
in parallel with the test split tasks and won't add to the total runtime.
For PRs SonarCloud detects unchanged files and avoids analysing them, so
the runtime there is expected to be lower.

I'd like to hear your thoughts on this, and I am looking for reviewers for
the PR https://github.com/apache/hive/pull/3339.

Best regards,
Alessandro


[jira] [Created] (HIVE-26378) Improve error message for masking over complex data types

2022-07-07 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26378:
---

 Summary: Improve error message for masking over complex data types
 Key: HIVE-26378
 URL: https://issues.apache.org/jira/browse/HIVE-26378
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2, Security
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26377) Support complex types for masking

2022-07-07 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26377:
---

 Summary: Support complex types for masking 
 Key: HIVE-26377
 URL: https://issues.apache.org/jira/browse/HIVE-26377
 Project: Hive
  Issue Type: New Feature
  Components: HiveServer2, Security
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


All the provided masking UDFs work for primitive types only, this ticket tracks 
work for adding support for complex data types too (e.g., array, struct, etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-26313) Aggregate all column statistics into a single field

2022-06-10 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26313:
---

 Summary: Aggregate all column statistics into a single field
 Key: HIVE-26313
 URL: https://issues.apache.org/jira/browse/HIVE-26313
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore, Statistics
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


At the moment, column statistics tables in the metastore schema look like this 
(it's similar for _PART_COL_STATS_):

{noformat}
CREATE TABLE "APP"."TAB_COL_STATS"(
"CAT_NAME" VARCHAR(256) NOT NULL,
"DB_NAME" VARCHAR(128) NOT NULL,
"TABLE_NAME" VARCHAR(256) NOT NULL,
"COLUMN_NAME" VARCHAR(767) NOT NULL,
"COLUMN_TYPE" VARCHAR(128) NOT NULL,
"LONG_LOW_VALUE" BIGINT,
"LONG_HIGH_VALUE" BIGINT,
"DOUBLE_LOW_VALUE" DOUBLE,
"DOUBLE_HIGH_VALUE" DOUBLE,
"BIG_DECIMAL_LOW_VALUE" VARCHAR(4000),
"BIG_DECIMAL_HIGH_VALUE" VARCHAR(4000),
"NUM_DISTINCTS" BIGINT,
"NUM_NULLS" BIGINT NOT NULL,
"AVG_COL_LEN" DOUBLE,
"MAX_COL_LEN" BIGINT,
"NUM_TRUES" BIGINT,
"NUM_FALSES" BIGINT,
"LAST_ANALYZED" BIGINT,
"CS_ID" BIGINT NOT NULL,
"TBL_ID" BIGINT NOT NULL,
"BIT_VECTOR" BLOB,
"ENGINE" VARCHAR(128) NOT NULL
);
{noformat}

The idea is to have a single blob named _STATISTICS_ to replace them, as 
follows:

{noformat}
CREATE TABLE "APP"."TAB_COL_STATS"(
"CAT_NAME" VARCHAR(256) NOT NULL,
"DB_NAME" VARCHAR(128) NOT NULL,
"TABLE_NAME" VARCHAR(256) NOT NULL,
"COLUMN_NAME" VARCHAR(767) NOT NULL,
"COLUMN_TYPE" VARCHAR(128) NOT NULL,
"STATISTICS" BLOB,
"LAST_ANALYZED" BIGINT,
"CS_ID" BIGINT NOT NULL,
"TBL_ID" BIGINT NOT NULL,
"ENGINE" VARCHAR(128) NOT NULL
);
{noformat}

The _STATISTICS_ column could be the serialization of a Json-encoded string, 
which will be consumed in a "schema-on-read" fashion.

At first at least the removed column statistics will be encoded in the 
_STATISTICS_ column, but since each "consumer" will read the portion of the 
schema it is interested into, multiple engines (see the _ENGINE_ column) can 
read and write statistics as they deem fit.

Another advantage is that, if we plan to add more statistics in the future, we 
won't need to change the thrift interface for the metastore again.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26297) Refactoring ColumnStatsAggregator classes to reduce warnings

2022-06-07 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26297:
---

 Summary: Refactoring ColumnStatsAggregator classes to reduce 
warnings
 Key: HIVE-26297
 URL: https://issues.apache.org/jira/browse/HIVE-26297
 Project: Hive
  Issue Type: Sub-task
  Components: Standalone Metastore, Statistics
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The interest of reducing warnings is to be able to focus on the important ones.

Some of the bugs fixed while writing unit-tests were highlighted as warnings 
(potential NPEs and rounding issues), but it was hard to see them among the 
many other (less severe) warnings.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26277) Add unit tests for ColumnStatsAggregator classes

2022-06-01 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26277:
---

 Summary: Add unit tests for ColumnStatsAggregator classes
 Key: HIVE-26277
 URL: https://issues.apache.org/jira/browse/HIVE-26277
 Project: Hive
  Issue Type: Test
  Components: Statistics, Tests
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


We have no unit tests covering these classes, which also happen to contain some 
complicated logic, making the absence of tests even more risky.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26263) Mysql metastore init tests are flaky

2022-05-24 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26263:
---

 Summary: Mysql metastore init tests are flaky
 Key: HIVE-26263
 URL: https://issues.apache.org/jira/browse/HIVE-26263
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


Similarly to HIVE-26084 (Oracle tests), also Mysql tests are failing similarly.

In both cases we use _:latest_ as docker image version, which is probably not 
ideal.

Reporting the error for future reference:
{noformat}
[2022-05-24T14:07:52.127Z] + sudo tee -a /etc/hosts
[2022-05-24T14:07:52.127Z] + echo 127.0.0.1 dev_mysql
[2022-05-24T14:07:52.127Z] 127.0.0.1 dev_mysql
[2022-05-24T14:07:52.127Z] + . /etc/profile.d/confs.sh
[2022-05-24T14:07:52.127Z] ++ export MAVEN_OPTS=-Xmx2g
[2022-05-24T14:07:52.127Z] ++ MAVEN_OPTS=-Xmx2g
[2022-05-24T14:07:52.127Z] ++ export HADOOP_CONF_DIR=/etc/hadoop
[2022-05-24T14:07:52.127Z] ++ HADOOP_CONF_DIR=/etc/hadoop
[2022-05-24T14:07:52.127Z] ++ export HADOOP_LOG_DIR=/data/log
[2022-05-24T14:07:52.127Z] ++ HADOOP_LOG_DIR=/data/log
[2022-05-24T14:07:52.127Z] ++ export 
'HADOOP_CLASSPATH=/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
[2022-05-24T14:07:52.127Z] ++ 
HADOOP_CLASSPATH='/etc/tez/:/active/tez/lib/*:/active/tez/*:/apps/lib/*'
[2022-05-24T14:07:52.127Z] ++ export HIVE_CONF_DIR=/etc/hive/
[2022-05-24T14:07:52.127Z] ++ HIVE_CONF_DIR=/etc/hive/
[2022-05-24T14:07:52.127Z] ++ export 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
[2022-05-24T14:07:52.127Z] ++ 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/active/hive/bin:/active/hadoop/bin:/active/eclipse/:/active/maven/bin/:/active/protobuf/bin:/active/visualvm/bin:/active/kubebuilder/bin:/active/idea/bin
[2022-05-24T14:07:52.127Z] ++ . /etc/profile.d/java.sh
[2022-05-24T14:07:52.127Z] +++ export JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
[2022-05-24T14:07:52.127Z] +++ JAVA_HOME=/usr/lib/jvm/zulu-8-amd64/
[2022-05-24T14:07:52.127Z] + sw hive-dev 
/home/jenkins/agent/workspace/hive-precommit_PR-3317
[2022-05-24T14:07:52.127Z] @ activating: 
/home/jenkins/agent/workspace/hive-precommit_PR-3317/packaging/target/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/apache-hive-4.0.0-alpha-2-SNAPSHOT-bin/
 for hive
[2022-05-24T14:07:52.127Z] + ping -c2 dev_mysql
[2022-05-24T14:07:52.127Z] PING dev_mysql (127.0.0.1) 56(84) bytes of data.
[2022-05-24T14:07:52.127Z] 64 bytes from localhost (127.0.0.1): icmp_seq=1 
ttl=64 time=0.114 ms
[2022-05-24T14:07:53.107Z] 64 bytes from localhost (127.0.0.1): icmp_seq=2 
ttl=64 time=0.123 ms
[2022-05-24T14:07:53.107Z] 
[2022-05-24T14:07:53.107Z] --- dev_mysql ping statistics ---
[2022-05-24T14:07:53.107Z] 2 packets transmitted, 2 received, 0% packet loss, 
time 49ms
[2022-05-24T14:07:53.107Z] rtt min/avg/max/mdev = 0.114/0.118/0.123/0.011 ms
[2022-05-24T14:07:53.107Z] + export DOCKER_NETWORK=host
[2022-05-24T14:07:53.107Z] + DOCKER_NETWORK=host
[2022-05-24T14:07:53.107Z] + export DBNAME=metastore
[2022-05-24T14:07:53.107Z] + DBNAME=metastore
[2022-05-24T14:07:53.107Z] + reinit_metastore mysql
[2022-05-24T14:07:53.107Z] @ initializing: mysql
[2022-05-24T14:07:53.107Z] metastore database name: metastore
[2022-05-24T14:07:53.381Z] @ starting dev_mysql...
[2022-05-24T14:07:53.382Z] Unable to find image 'mariadb:latest' locally
[2022-05-24T14:07:54.354Z] latest: Pulling from library/mariadb
[2022-05-24T14:07:54.354Z] 125a6e411906: Pulling fs layer
[2022-05-24T14:07:54.354Z] a28b55cc656d: Pulling fs layer
[2022-05-24T14:07:54.354Z] f2325f4e25a1: Pulling fs layer
[2022-05-24T14:07:54.354Z] c6c2d09f748d: Pulling fs layer
[2022-05-24T14:07:54.354Z] af2b4ed853d2: Pulling fs layer
[2022-05-24T14:07:54.354Z] 8394ac6b401e: Pulling fs layer
[2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Pulling fs layer
[2022-05-24T14:07:54.354Z] 1b11b2e20899: Pulling fs layer
[2022-05-24T14:07:54.354Z] 3d35790a91d9: Pulling fs layer
[2022-05-24T14:07:54.354Z] 5e73c7793365: Pulling fs layer
[2022-05-24T14:07:54.354Z] 3d34b9f14ede: Pulling fs layer
[2022-05-24T14:07:54.354Z] c6c2d09f748d: Waiting
[2022-05-24T14:07:54.354Z] 8394ac6b401e: Waiting
[2022-05-24T14:07:54.354Z] 5b150cf0c5a7: Waiting
[2022-05-24T14:07:54.354Z] 3d35790a91d9: Waiting
[2022-05-24T14:07:54.354Z] 5e73c7793365: Waiting
[2022-05-24T14:07:54.354Z] 3d34b9f14ede: Waiting
[2022-05-24T14:07:54.354Z] af2b4ed853d2: Waiting
[2022-05-24T14:07:54.624Z] a28b55cc656d: Verifying Checksum
[2022-05-24T14:07:54.624Z] a28b55cc656d: Download complete
[2022-05-24T14:07:54.624Z] f2325f4e25a1: Verifying Checksum
[2022-05-24T14:07:54.624Z] f2325f4e25a1: Download complete
[2022-05-24T14:07:54.918Z] af2b4ed853d2: Download complete
[2022-05-24T14:07:54.918Z] c6c2d09f748d: Verifying

[jira] [Created] (HIVE-26243) Add vectorized implementation of the 'ds_kll_sketch' UDAF

2022-05-20 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26243:
---

 Summary: Add vectorized implementation of the 'ds_kll_sketch' UDAF
 Key: HIVE-26243
 URL: https://issues.apache.org/jira/browse/HIVE-26243
 Project: Hive
  Issue Type: Improvement
  Components: UDF, Vectorization
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


_ds_kll_sketch_ UDAF does not have a vectorized implementation at the moment, 
the present ticket aims at bridging this gap.

This is particularly important because vectorization has an "all or nothing" 
approach, so if this function is used at the side of vectorized functions, they 
won't be able to benefit from vectorized execution.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26221) Add histogram-based column statistics

2022-05-11 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26221:
---

 Summary: Add histogram-based column statistics
 Key: HIVE-26221
 URL: https://issues.apache.org/jira/browse/HIVE-26221
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Hive does not support histogram statistics, which are particularly useful for 
skewed data (which is very common in practice) and range predicates.

Hive's current selectivity estimation for range predicates is based on a 
hard-coded value of 1/3 (see 
[FilterSelectivityEstimator.java#L138-L144|[https://github.com/apache/hive/blob/4622860b8c7dbddaf4c556e65c5039c60da15e82/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java#L138-L144]).]

The current proposal aims at integrating histogram as an additional column 
statistics, stored into the Hive metastore at the table (or partition) level.

The main requirements for histogram integration are the following:
 * efficiency: the approach must scale and support billions of rows
 * merge-ability: partition-level histograms have to be merged to form 
table-level histograms
 * explicit and configurable trade-off between memory footprint and accuracy

Hive already integrates [KLL data 
sketches|https://datasketches.apache.org/docs/KLL/KLLSketch.html] UDAF. 
Datasketches are small, stateful programs that process massive data-streams and 
can provide approximate answers, with mathematical guarantees, to 
computationally difficult queries orders-of-magnitude faster than traditional, 
exact methods.

We propose to use KLL, and more specifically the cumulative distribution 
function (CDF) as underlying data structure for our histogram statistics.

The current proposal only targets numeric data types (float, integer and 
numeric families), excluding string and temporal data types for the moment.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-09 Thread Alessandro Solimando
Hi Stamatis,
thanks for bringing up this topic, I basically agree on everything you
wrote.

I just wanted to add that this kind of proposal might sound harsh, because
in many contexts upgrading is a complex process, but it's in nobody's
interest to keep release branches that are missing important
fixes/improvements and that might not meet the quality standards that
people expect, as mentioned.

Since we don't have yet a stable 4.x release (only alpha for now) we might
want to keep supporting the 3.x branch until the first 4.x stable release
and EOL < 3.x branches, WDYT?

Best regards,
Alessandro

On Fri, 6 May 2022 at 23:14, Stamatis Zampetakis  wrote:

> Hi all,
>
> The current master has many critical bug fixes as well as important
> performance improvements that are not backported (and most likely never
> will) to the maintenance branches.
>
> Backporting changes from master usually requires adapting the code and
> tests in questions making it a non-trivial and time consuming task.
>
> The ASF bylaws require PMCs to deliver high quality software which satisfy
> certain criteria. Cutting new releases from maintenance branches with known
> critical bugs is not compliant with the ASF.
>
> CI is unstable in all maintenance branches making the quality of a release
> questionable and merging new PRs rather difficult. Enabling and running it
> frequently in all maintenance branches would require a big amount of
> resources on top of what we already need for master.
>
> History has shown that it is very difficult or impossible to properly
> maintain multiple release branches for Hive.
>
> I think it would be to the best interest of the project if the PMC decided
> to drop support for maintenance branches and focused on releasing
> exclusively from master.
>
> This mail is related to the discussion about the release cadence [1] since
> it would certainly help making Hive releases more regular. I decided to
> start a separate thread to avoid mixing multiple topics together.
>
> Looking forward to your thoughts.
>
> Best,
> Stamatis
>
> [1] https://lists.apache.org/thread/n245dd23kb2v3qrrfp280w3pto89khxj
>
>


Re: excluding jdk.tools for java 11 compatibility

2022-05-05 Thread Alessandro Solimando
Hi again,
actually I managed to exclude the project by using the FQN (I was missing
the "upgrade-acid/" part):

mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.0.2155:sonar \
 -DskipTests -Dit.skipTests -Dmaven.javadoc.skip -pl
'!upgrade-acid,!upgrade-acid/pre-upgrade'

I would still like to hear your opinion about the exclusion, since it will
be a problem when moving to JDK11 anyway, which I have seen it's a blocker
for 4.0.0 release.

Best regards,
Alessandro

On Thu, 5 May 2022 at 16:38, Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi everyone,
> I am working on https://issues.apache.org/jira/browse/HIVE-26196.
>
> As you might know, Sonar analysis must now run with at least JDK 11, and
> when I tried it failed as follows:
>
> [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not
> resolve dependencies for project
> org.apache.hive:hive-pre-upgrade:jar:4.0.0-alpha-2-SNAPSHOT: Could not find
> artifact jdk.tools:jdk.tools:jar:1.7 at specified path
> /Users/asolimando/.sdkman/candidates/java/11.0.11.hs-adpt/../lib/tools.jar
> -> [Help 1]
>
> The issue is located here:
>
> https://github.com/apache/hive/blob/master/upgrade-acid/pre-upgrade/pom.xml#L52-L75
>
> Adding an exclusion on jdk.tools as follows fixes the problem:
> 
>   jdk.tools
>   jdk.tools
> 
>
> I guess it's safe to add this exclusion, since the of the dependency scope
> is "provided" (meaning that the dependency is expected to be in the
> classpath already at runtime, so the exclusion won't interfere with that,
> nothing is packaged differently from Hive due to the exclusion), and both
> compilation under JDK8 and the run of the full test suite in CI were OK.
>
> Do you guys see any problem with this approach?
>
> Before this solution, I have tried to add the "skip.sonar" maven property
> (as per
> https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/)
> but it is ignored.
>
> Another approach would have been to exclude the submodule from sonar
> analysis using maven reactor, but I can't seem to find a name of the
> module, "upgrade-acid" is excluded (but the submodule mentioned here still
> gets processed and fails), but "pre-upgrade" does not and fails as follows:
>
> $ mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.0.2155:sonar \
>  -DskipTests -Dit.skipTests -Dmaven.javadoc.skip -pl '!pre-upgrade'
> [INFO] Scanning for projects...
> [ERROR] [ERROR] Could not find the selected project in the reactor:
> pre-upgrade @
> [ERROR] Could not find the selected project in the reactor: pre-upgrade ->
> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MavenExecutionException
>
> Best regards,
> Alessandro
>


excluding jdk.tools for java 11 compatibility

2022-05-05 Thread Alessandro Solimando
Hi everyone,
I am working on https://issues.apache.org/jira/browse/HIVE-26196.

As you might know, Sonar analysis must now run with at least JDK 11, and
when I tried it failed as follows:

[ERROR] Failed to execute goal on project hive-pre-upgrade: Could not
resolve dependencies for project
org.apache.hive:hive-pre-upgrade:jar:4.0.0-alpha-2-SNAPSHOT: Could not find
artifact jdk.tools:jdk.tools:jar:1.7 at specified path
/Users/asolimando/.sdkman/candidates/java/11.0.11.hs-adpt/../lib/tools.jar
-> [Help 1]

The issue is located here:
https://github.com/apache/hive/blob/master/upgrade-acid/pre-upgrade/pom.xml#L52-L75

Adding an exclusion on jdk.tools as follows fixes the problem:

  jdk.tools
  jdk.tools


I guess it's safe to add this exclusion, since the of the dependency scope
is "provided" (meaning that the dependency is expected to be in the
classpath already at runtime, so the exclusion won't interfere with that,
nothing is packaged differently from Hive due to the exclusion), and both
compilation under JDK8 and the run of the full test suite in CI were OK.

Do you guys see any problem with this approach?

Before this solution, I have tried to add the "skip.sonar" maven property
(as per
https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/)
but it is ignored.

Another approach would have been to exclude the submodule from sonar
analysis using maven reactor, but I can't seem to find a name of the
module, "upgrade-acid" is excluded (but the submodule mentioned here still
gets processed and fails), but "pre-upgrade" does not and fails as follows:

$ mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.0.2155:sonar \
 -DskipTests -Dit.skipTests -Dmaven.javadoc.skip -pl '!pre-upgrade'
[INFO] Scanning for projects...
[ERROR] [ERROR] Could not find the selected project in the reactor:
pre-upgrade @
[ERROR] Could not find the selected project in the reactor: pre-upgrade ->
[Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MavenExecutionException

Best regards,
Alessandro


[jira] [Created] (HIVE-26196) Integrate Sonarcloud analysis for master branch

2022-05-02 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26196:
---

 Summary: Integrate Sonarcloud analysis for master branch
 Key: HIVE-26196
 URL: https://issues.apache.org/jira/browse/HIVE-26196
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The aim of the ticket is to integrate SonarCloud analysis for the master branch.

The ticket does not cover:
 * test coverage
 * analysis on PRs and other branches

Those aspects can be added in follow-up tickets, if there is enough interest.

>From preliminary tests, the analysis step requires 30 additional minutes for 
>the pipeline.

The idea for this first integration is to track code quality metrics over new 
commits in the master branch, without any quality gate rules (i.e., the 
analysis will never fail, independently of the values of the quality metrics).

An example of analysis is available in my personal Sonar account: 
[https://sonarcloud.io/summary/new_code?id=asolimando_hive]

ASF offers SonarCloud accounts for Apache projects, and Hive already has one, 
for completing the present ticket, somebody having admin permissions in that 
repo should generated an authentication token, which should replace the 
_SONAR_TOKEN_ secret.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HIVE-26150) OrcRawRecordMerger reads each row twice

2022-04-19 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26150:
---

 Summary: OrcRawRecordMerger reads each row twice
 Key: HIVE-26150
 URL: https://issues.apache.org/jira/browse/HIVE-26150
 Project: Hive
  Issue Type: Bug
  Components: ORC, Transactions
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


OrcRawRecordMerger reads each row twice, the issue does not surface since the 
merger is only used with the parameter "collapseEvents" as true, which filters 
out one of the two rows.

collapseEvents true and false should produce the same result, since in current 
acid implementation, each event has a distinct rowid, so two identical rows 
cannot be there, this is the case only for the bug.

In order to reproduce the issue, it is sufficient to set the second parameter 
to false 
[here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
 and run tests in TestOrcRawRecordMerger and observe two tests failing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26147) OrcRawRecordMerger throws NPE when hive.acid.key.index is missing for an acid file

2022-04-16 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26147:
---

 Summary: OrcRawRecordMerger throws NPE when hive.acid.key.index is 
missing for an acid file
 Key: HIVE-26147
 URL: https://issues.apache.org/jira/browse/HIVE-26147
 Project: Hive
  Issue Type: Bug
  Components: ORC, Transactions
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


When _hive.acid.key.index_ is missing for an acid ORC file _OrcRawRecordMerger_ 
throws as follows:

{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795)
 ~[hive-exec-4.0.0-alpha-2-SNAPS
HOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:1053)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.
0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a
lpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4
.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769)
 ~[hive-exec-4.0.0-alpha
-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335)
 ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-
alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-
SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA
PSHOT]
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) 
~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha
-2-SNAPSHOT]
at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489)
 ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar:
4.0.0-alpha-2-SNAPSHOT]
... 24 more
{noformat}

For this situation to happen, the ORC file must have more than one stripe, and 
the offset of the element to seek should be locate it beyond the first stripe 
but before the tail one, as the code clearly suggests:

{code:java}
if (firstStripe != 0) {
  minKey = keyIndex[firstStripe - 1];
}
if (!isTail) {
  maxKey = keyIndex[firstStripe + stripeCount - 1];
}
{code}

However, in the context of the detection of the original issue, the NPE was 
triggered even by a simple "select *" over a table with ORC files missing the 
_hive.acid.key.index_ metadata information, but it was never failing for ORC 
files with a single stripe. The file was generated after a major compaction of 
acid and non-acid data.

In order to force an offset located in a stripe in the middle, one can use the 
following query, knowing in what stripe a particular value exists:

{code:sql}
select * from $table where c = $value
{code}

_OrcRawRecordMerger_ should simply leave as "null" the min and max keys when 
the _hive.acid.key.index_ metadata is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26146) Handle missing hive.acid.key.index in the fixacidkeyindex utility

2022-04-16 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26146:
---

 Summary: Handle missing hive.acid.key.index in the fixacidkeyindex 
utility
 Key: HIVE-26146
 URL: https://issues.apache.org/jira/browse/HIVE-26146
 Project: Hive
  Issue Type: Improvement
  Components: ORC, Transactions
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


There is a utility in hive which can validate/fix corrupted 
_hive.acid.key.index_: 

{code:bash}
hive --service fixacidkeyindex
{code}

At the moment the utility throws a NPE if the _hive.acid.key.index_ metadata 
entry is missing:

{noformat}
ERROR checking /hive-dev-box/multistripe_ko_acid.orc
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.validate(FixAcidKeyIndex.java:183)
at 
org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.checkFile(FixAcidKeyIndex.java:147)
at 
org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.checkFiles(FixAcidKeyIndex.java:130)
at 
org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex.main(FixAcidKeyIndex.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
{noformat}

The aim of this ticket is to handle such case in order to support re-generating 
this metadata entry even when it is missing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26125) sysdb fails with mysql as metastore db

2022-04-07 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26125:
---

 Summary: sysdb fails with mysql as metastore db
 Key: HIVE-26125
 URL: https://issues.apache.org/jira/browse/HIVE-26125
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


---
Test set: org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver
---
Tests run: 3, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 282.638 s <<< 
FAILURE! - in org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver
org.apache.hadoop.hive.cli.TestMysqlMetastoreCliDriver.testCliDriver[strict_managed_tables_sysdb]
  Time elapsed: 41.104 s  <<< FAILURE!
java.lang.AssertionError: 
Client execution failed with error code = 2 
running 

select tbl_name, tbl_type from tbls where tbl_name like 'smt_sysdb%' order by 
tbl_name 
fname=strict_managed_tables_sysdb.q

See ./ql/target/tmp/log/hive.log or ./itests/qtest/target/tmp/log/hive.log, or 
check ./ql/target/surefire-reports or ./itests/qtest/target/surefire-reports/ 
for specific test cases logs.
 org.apache.hadoop.hive.ql.metadata.HiveException: Vertex failed, 
vertexName=Map 1, vertexId=vertex_1649344918728_0001_33_00, diagnostics=[Task 
failed, taskId=task_1649344918728_0001_33_00_00, diagnostics=[TaskAttempt 0 
failed, info=[Error: Error while running task ( failure ) : 
attempt_1649344918728_0001_33_00_00_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught 
exception while trying to execute query:You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near '"TBLS"' at line 14
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:348)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:276)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:381)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:82)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:69)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:39)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught 
exception while trying to execute query:You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near '"TBLS"' at line 14
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:89)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:414)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:293)
... 15 more
Caused by: java.io.IOException: java.io.IOException: 
org.apache.hive.storage.jdbc.exception.HiveJdbcDatabaseAccessException: Caught 
exception while trying to execute query:You have an error in your SQL syntax; 
check the manual that corresponds to your MySQL server version for the right 
syntax to use near '"TBLS"' at line 14
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:380)
at 
org.

[jira] [Created] (HIVE-26123) Introduce test coverage for sysdb for the different metastores

2022-04-06 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26123:
---

 Summary: Introduce test coverage for sysdb for the different 
metastores
 Key: HIVE-26123
 URL: https://issues.apache.org/jira/browse/HIVE-26123
 Project: Hive
  Issue Type: Test
  Components: Testing Infrastructure
Affects Versions: 4.0.0-alpha-1
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
 Fix For: 4.0.0-alpha-2


_sydb_ provides a view over (some) metastore tables from Hive via JDBC queries. 
Existing tests are running only against Derby, meaning that any change against 
sysdb query mapping are not covered by CI.

The present ticket aims at bridging this gap by introducing test coverage for 
the different supported metastore for sydb.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26122) Factorize out common docker code between DatabaseRule and AbstractExternalDB

2022-04-06 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26122:
---

 Summary: Factorize out common docker code between DatabaseRule and 
AbstractExternalDB
 Key: HIVE-26122
 URL: https://issues.apache.org/jira/browse/HIVE-26122
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Affects Versions: 4.0.0-alpha-1
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
 Fix For: 4.0.0-alpha-2


Currently there is a lot of shared code between the two classes which could be 
extracted into a utility class called DockerUtils, since all this code pertains 
docker.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26113) Syncronize HMS tables with metastore tables

2022-04-04 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26113:
---

 Summary: Syncronize HMS tables with metastore tables
 Key: HIVE-26113
 URL: https://issues.apache.org/jira/browse/HIVE-26113
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0-alpha-1
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando
 Fix For: 4.0.0-alpha-2


HMS tables should be in sync with those exposed by Hive metastore via _sysdb_.

At the moment there are some discrepancies for the existing tables, the present 
ticket aims at bridging this gap.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26112) Missing scripts for metastore

2022-04-04 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26112:
---

 Summary: Missing scripts for metastore
 Key: HIVE-26112
 URL: https://issues.apache.org/jira/browse/HIVE-26112
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 4.0.0-alpha-2
Reporter: Alessandro Solimando


The version of the scripts for _metastore_ and _standalone-metastore_ should be 
in sync, but at the moment for the metastore side we are missing 3.2.0 scripts 
(in _metastore/scripts/upgrade/hive_), while they are present in the 
standalone_metastore counterpart(s):

* upgrade-3.1.0-to-3.2.0.derby.sql
* upgrade-3.2.0-to-4.0.0-alpha-1.derby.sql
* upgrade-4.0.0-alpha-1-to-4.0.0-alpha-2.hive.sql



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26066) Remove deprecated GenericUDAFComputeStats

2022-03-24 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-26066:
---

 Summary: Remove deprecated GenericUDAFComputeStats
 Key: HIVE-26066
 URL: https://issues.apache.org/jira/browse/HIVE-26066
 Project: Hive
  Issue Type: Task
  Components: Statistics
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


The function has been deprecated and it is currently not used (it is registered 
in the function registry and covered by some qtests, though).

As soon as we move to the next release cycle, the function can be removed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Failing tests

2022-03-16 Thread Alessandro Solimando
Thanks Peter for taking care of this, I can confirm that "master" is good
now!

On Wed, 16 Mar 2022 at 12:30, Peter Vary  wrote:

> The tests should be green now...
>
> > On 2022. Mar 15., at 14:45, Stamatis Zampetakis 
> wrote:
> >
> > Hello,
> >
> > +1 to everything Peter said.
> >
> > Moreover a few other things/reminders which could make our life easier.
> >
> > No commits/merges over broken master.
> >
> > If there is a non-flaky failure in master then whoever notices it first,
> > please create a JIRA and add any relevant info. This will notify everyone
> > that there is a problem and it will also avoid having multiple people
> > looking at it.
> >
> > If there is failure in master or during precommit tests that seems to be
> > intermittent please run the flaky checker job [1]. If the result shows
> it's
> > flaky, log a JIRA and raise a PR disabling the test if there is no quick
> > fix available.
> >
> > Rerun precommit tests before merging a pull request if the latest
> precommit
> > run is old (e.g., greater than 72h).
> >
> > Best,
> > Stamatis
> >
> > [1] http://ci.hive.apache.org/job/hive-flaky-check/
> >
> >
> > On Mon, Mar 14, 2022 at 9:31 PM Peter Vary 
> > wrote:
> >
> >> If I remember correctly the decision was to not to merge changes with
> >> failing PreCommit tests.
> >>
> >> Lately, because of a mistake where the change was only partially merged,
> >> we had a failing test.
> >> I have tried to fix this issue and confirm it by rerunning the tests,
> but
> >> the check failed again. Now it failed with some different tests,
> because in
> >> the meantime there were some more failing tests were committed to
> master in
> >> the meantime.
> >>
> >> I think it would be good to stick to the previous decision and we should
> >> only commit changes if all of the tests are green. Also if there are
> some
> >> issues then it would be good to take the time to fix the failures or
> revert
> >> the changes causing the issues.
> >>
> >> Thanks,
> >> Peter
> >>
> >>
> >>
>
>


Re: Start releasing the master branch

2022-03-01 Thread Alessandro Solimando
;>> * standalone-metastore
> >>>> ** 4.0.0-SNAPSHOT in the repo
> >>>> ** last release is 3.1.2
> >>>> * hive
> >>>> ** 4.0.0-SNAPSHOT in the repo
> >>>> ** last release is 3.1.2
> >>>>
> >>>> Regarding the actual version number I'm not entirely sure where we
> should
> >>>> start the numbering - that's why I was referring to it as Hive-X in my
> >>>> first letter.
> >>>>
> >>>> I think the key point here would be to start shipping releases
> regularily
> >>>> and not the actual version number we will use - I'll kinda open to any
> >>>> versioning scheme which
> >>>> reflects that this is a newer release than 3.1.2.
> >>>>
> >>>> I could imagine the following ones:
> >>>> (A) start with something less expected; but keep 3 in the prefix to
> >>>> reflect that this is not yet 4.0
> >>>>  I can imagine the following numbers:
> >>>>  3.900.0, 3.901.0, ...
> >>>>  3.9.0, 3.9.1, ...
> >>>> (B) start 4.0.0
> >>>>  4.0.0, 4.1.0, ...
> >>>> (C) jump to some calendar based version number like 2022.2.9
> >>>>  trunk based development has pros and cons...making a move like
> this
> >>>> irreversibly pledges trunk based development; and makes release
> branches
> >>>> hard to introduce
> >>>> (X) somewhat orthogonal is to (also) use some suffixes
> >>>>  4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
> >>>>  this is probably the most tempting to use - but this versioning
> >>>> schema with a non-changing MINOR and PATCH number will
> >>>>  also suggest that the actual software is fully compatible - and
> only
> >>>> bugs are being fixed - which will not be true...
> >>>>
> >>>> I really like the idea to suffix these releases with alpha or beta -
> >>>> which
> >>>> will communicate our level commitment that these are not 100%
> production
> >>>> ready artifacts.
> >>>>
> >>>> I think we could fix HIVE-25665; and probably experiment with
> >>>> 4.0.0-alpha1
> >>>> for start...
> >>>>
> >>>>> This also means there should *not* be a branch-4 after releasing Hive
> >>>> 4.0
> >>>>> and let that diverge (and becomes the next, super-ignored branch-3),
> >>>> correct; no need to keep a branch we don't maintain...but in any case
> I
> >>>> think we can postpone this decision until there will be something to
> >>>> release... :)
> >>>>
> >>>> cheers,
> >>>> Zoltan
> >>>>
> >>>>
> >>>>
> >>>> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
> >>>>> Hi All!
> >>>>>
> >>>>> A purely technical question: what will the SNAPSHOT version become
> after
> >>>>> releasing Hive 4.0.0? I think this is important, as it defines and
> >>>> reflects
> >>>>> the future release plans.
> >>>>>
> >>>>> Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 +
> branch-3.
> >>>>> Hive is an evolving and super-active project: if we want to make
> regular
> >>>>> releases, we should simply release Hive 4.0 and bump pom to
> >>>> 4.1.0-SNAPSHOT,
> >>>>> which clearly says that we can release Hive 4.1 anytime we want,
> without
> >>>>> being frustrated about "whether we included enough cool stuff to
> release
> >>>>> 5.0".
> >>>>>
> >>>>> This also means there should *not* be a branch-4 after releasing
> Hive
> >>>>> 4.0
> >>>>> and let that diverge (and becomes the next, super-ignored branch-3),
> >>>>> only
> >>>>> when we end up bringing a minor backward-incompatible thing that
> needs a
> >>>>> 4.0.x, and when it happens, we'll create *branch-4.0 *on demand. For
> me,
> >>>> a
> >>>>> branch called *branch-4.0* doesn't imply either I can expect cool
> >>>> releases
> >>>>> in the future from there or the branch is maintained and tries to be
> in
> >>>>> sync with the *master*.
> &g

[jira] [Created] (HIVE-25974) Drop HiveFilterMergeRule and use FilterMergeRule from Calcite

2022-02-23 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25974:
---

 Summary: Drop HiveFilterMergeRule and use FilterMergeRule from 
Calcite
 Key: HIVE-25974
 URL: https://issues.apache.org/jira/browse/HIVE-25974
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


HiveFilterMergeRule is a copy of FilterMergeRule which was needed since the 
latter did not simplify/flatten before creating the merged filter.

This behaviour has been fixed in CALCITE-3982 (released since 1.23), so it 
seems that the Hive rule could be removed now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25966) Align HiveRelMdPredicates getPredicates() for Project with RelMdPredicates

2022-02-17 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25966:
---

 Summary: Align HiveRelMdPredicates getPredicates() for Project 
with RelMdPredicates
 Key: HIVE-25966
 URL: https://issues.apache.org/jira/browse/HIVE-25966
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The forked version of _HiveRelMdPredicates::getPredicates(Projection ...)_ 
should be aligned with the current version of _RelMdPredicates_ in order to 
facilitate dropping it in a later step (the difference at the moment is for the 
visitor used to identify constants, which behaves slightly different between 
the two).

The ticket aims at refactoring the method to bring it close to the current 
version in Calcite.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25953) Unify getPredicates() for Join between HiveRelMdPredicates and RelMdPredicates

2022-02-11 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25953:
---

 Summary: Unify getPredicates() for Join between 
HiveRelMdPredicates and RelMdPredicates
 Key: HIVE-25953
 URL: https://issues.apache.org/jira/browse/HIVE-25953
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The goal of the ticket is to unify the two implementations and remove the 
override in HiveRelMdPredicates. 

At the moment, the main blocker is that the Hive variant still relies in 
RexNode's comparison via its String digest, while Calcite does not need that 
anymore.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25952) Unify getPredicates() for Projection between HiveRelMdPredicates and RelMdPredicates

2022-02-11 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25952:
---

 Summary: Unify getPredicates() for Projection between 
HiveRelMdPredicates and RelMdPredicates
 Key: HIVE-25952
 URL: https://issues.apache.org/jira/browse/HIVE-25952
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


There are some differences on this method between Hive and Calcite, the idea of 
this ticket is to unify the two methods, and then drop the override in 
HiveRelMdPredicates in favour of the method of RelMdPredicates.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25951) Re-use methods from RelMdPredicates in HiveRelMdPredicates

2022-02-11 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25951:
---

 Summary: Re-use methods from RelMdPredicates in HiveRelMdPredicates
 Key: HIVE-25951
 URL: https://issues.apache.org/jira/browse/HIVE-25951
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


This ticket makes HiveRelMdPredicates extend RelMdPredicates, and remove the 
duplicate methods which share the same behaviour, while overriding those for 
which there is a difference.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Start releasing the master branch

2022-02-08 Thread Alessandro Solimando
Hello everyone,
thank you for starting this discussion.

I agree that releasing the master branch regularly and sufficiently often
is welcome and vital for the health of the community.

It would be great to hear from others too, especially PMC members and
committers, but even simple contributors/followers as myself.

Best regards,
Alessandro

On Wed, 2 Feb 2022 at 12:22, Stamatis Zampetakis  wrote:

> Hello,
>
> Thanks for starting the discussion Zoltan.
>
> I strongly believe that it is important to have regular and often releases
> otherwise people will create and maintain separate Hive forks.
> The latter is not good for the project and the community may lose valuable
> members because of it.
>
> Going forward I fully agree that there is no point bringing up strong
> blockers for the next release. For sure there are many backward
> incompatible changes and possibly unstable features but unless we get a
> release out it will be difficult to determine what is broken and what needs
> to be fixed.
>
> Due to the big number of changes that are going to appear in the next
> version I would suggest using the terms Hive X-alpha, Hive X-beta for the
> first few releases. This will make it clear to the end users that they need
> to be careful when upgrading from an older version and it will give us a
> bit more time and freedom to treat issues that the users will likely
> discover.
>
> The only real blocker that we may want to treat is HIVE-25665 [1] but we
> can continue the discussion under that ticket and re-evaluate if necessary,
>
> Best,
> Stamatis
>
> [1] https://issues.apache.org/jira/browse/HIVE-25665
>
>
> On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich  wrote:
>
> > Hey All,
> >
> > We didn't made a release for a long time now; (3.1.2 was released on 26
> > August 2019) - and I think because we didn't made that many branch-3
> > releases; not too many fixes
> > were ported there - which made that release branch kinda erode away.
> >
> > We have a lot of new features/changes in the current master.
> > I think instead of aiming for big feature-packed releases we should aim
> > for making a regular release every few months - we should make regular
> > releases which people could
> > install and use.
> > After all releasing Hive after more than 2 years would be big step
> forward
> > in itself alone - we have so many improvements that I can't even count...
> >
> > But I may know not every aspects of the project / states of some internal
> > features - so I would like to ask you:
> > What would be the bare minimum requirements before we could release the
> > current master as Hive X?
> >
> > There are many nice-to-have-s like:
> > * hadoop upgrade
> > * jdk11
> > * remove HoS or MR
> > * ?
> > but I don't think these are blockers...we can make any of these in the
> > next release if we start making them...
> >
> > cheers,
> > Zoltan
> >
>


[jira] [Created] (HIVE-25940) Add 'set' variant to print only properties with a non-default value

2022-02-08 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25940:
---

 Summary: Add 'set' variant to print only properties with a 
non-default value
 Key: HIVE-25940
 URL: https://issues.apache.org/jira/browse/HIVE-25940
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


The goal of the ticket is to add a "set" command flavour to print only 
properties for which a non-default value is assigned.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25938) Print in EXPLAIN CBO which rules are excluded from planning

2022-02-08 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25938:
---

 Summary: Print in EXPLAIN CBO which rules are excluded from 
planning
 Key: HIVE-25938
 URL: https://issues.apache.org/jira/browse/HIVE-25938
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


HIVE-25880 introduced a configuration parameter for excluding CBO rules based 
on a regex on their description.

Calcite logs when a rule is excluded (see 
[AbstractRelOptPlanner.java#L316|https://github.com/apache/calcite/blob/e42b85a45bd16dd58db1546736e653deda5463fe/core/src/main/java/org/apache/calcite/plan/AbstractRelOptPlanner.java#L316]
 and 
[VolcanoRuleCall.java#L169|https://github.com/apache/calcite/blob/e42b85a45bd16dd58db1546736e653deda5463fe/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoRuleCall.java#L169]).

To ease investigations, this should be complemented by printing the regex used 
(if not blank) in Hive DEBUG logs, and the same in CBO information (i.e., 
EXPLAIN CBO's output).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [ANNOUNCE] Denys Kuzmenko joins Hive PMC

2022-02-07 Thread Alessandro Solimando
Congratulations Denys! :)

On Mon, 7 Feb 2022 at 18:37, Pravin Sinha  wrote:

> Congrats, Denys !
>
> On Mon, Feb 7, 2022 at 11:02 PM aasha medhi 
> wrote:
>
> > Congratulations Denys !
> >
> > On Mon, Feb 7, 2022 at 10:36 PM Laszlo Pinter
>  > >
> > wrote:
> >
> > > Congrats Denys!
> > >
> > > On Mon, Feb 7, 2022, 6:00 PM László Bodor 
> > > wrote:
> > >
> > > > Congrats Denys!!
> > > >
> > > > Naresh P R  ezt írta (időpont: 2022.
> febr.
> > > 7.,
> > > > H, 17:43):
> > > >
> > > > > Congrats Denys, well deserved !!!
> > > > > ---
> > > > > Regards,
> > > > > Naresh P R
> > > > >
> > > > > On Mon, Feb 7, 2022 at 8:40 AM Ashutosh Chauhan <
> > hashut...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm pleased to announce that Denys has accepted an invitation to
> > > > > > join the Hive PMC. Denys has been a consistent and helpful
> > > > > > figure in the Hive community for which we are very grateful. We
> > > > > > look forward to the continued contributions and support.
> > > > > >
> > > > > > Please join me in congratulating Denys!
> > > > > >
> > > > > > Ashutosh (On behalf of Hive PMC)
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] New committer: Ayush Saxena

2022-02-07 Thread Alessandro Solimando
Congrats Ayush!

On Mon, 7 Feb 2022 at 16:50, László Bodor  wrote:

> Welcome Ayush, well deserved!
>
> Ashutosh Chauhan  ezt írta (időpont: 2022. febr. 7.,
> H, 16:35):
>
> > Hi all,
> > Apache Hive's Project Management Committee (PMC) has invited Ayush
> > to become a committer, and we are pleased to announce that he has
> accepted!
> >
> > Ayush welcome, thank you for your contributions, and we look forward to
> > your
> > further interactions with the community!
> > Ashutosh (on behalf of Hive PMC)
> >
>


[jira] [Created] (HIVE-25917) Use default value for 'hive.default.nulls.last' when no config is available instead of false

2022-01-31 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25917:
---

 Summary: Use default value for 'hive.default.nulls.last' when no 
config is available instead of false
 Key: HIVE-25917
 URL: https://issues.apache.org/jira/browse/HIVE-25917
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Properties for scheduling compactions on specific queues

2022-01-31 Thread Alessandro Solimando
Hi Stamatis,
the proposal seems reasonable to me.

I think that setting the two properties you mention, independently from the
underlying execution engine in use, should lead to the same result.

In addition, I also agree that we should deprecate the per-execution engine
properties.

Best regards,
Alessandro

On Mon, 31 Jan 2022 at 10:51, Stamatis Zampetakis  wrote:

> Hi all,
>
> This email is an attempt to converge on which Hive/Tez/MR properties
> someone should use in order to schedule a compaction on specific queues.
> For those who are not familiar with how queues are used the YARN capacity
> scheduler documentation [1] gives the general idea.
>
> Using specific queues for compaction jobs is necessary to be able to
> efficiently allocate resources for maintenance tasks (compaction) and
> production workloads. Hive provides various ways to control the queues used
> by the compactor and there have been various tickets with improvements and
> fixes in this area (see list below).
>
> The granularity we can select queues for compactions (all tables vs. per
> table) currently depends on which compactor is in use (MR vs Query based)
> and boils down to the following properties:
>
> Global configuration:
> * hive.compactor.job.queue
> * mapred.job.queue.name
> * tez.queue.name
>
> Per table/statement configuration (table properties):
> * compactor.mapred.job.queue.name (before HIVE-20723)
> * compactor.hive.compactor.job.queue (after HIVE-20723)
>
> Things are a bit blurred with respect to what properties someone should
> use to achieve the desired result. Some changes, such as HIVE-20723, raise
> backward compatibility concerns and other changes seem to have a larger
> impact than the one specifically designed for. For example, after
> HIVE-25595, map reduce queue properties can have an impact on the compactor
> queues even when Tez is in use.
>
> In order to avoid confusion and ensure long term support of these queue
> selection features we should clarify which of the above properties should
> be used.
>
> Given the current situation, I would propose to officially support only
> the following:
> * hive.compactor.job.queue
> * compactor.hive.compactor.job.queue
> and align the implementation based on these (if necessary). In other
> words, Hive users should not use mapred.job.queue.name and tez.queue.name
> explicitly at least when it comes to the compactor. Hive should set them
> transparently (as it happens now in various places) based on
> [compactor.]hive.compactor.job.queue.
>
> What do people think? Are there other ideas?
>
> Best,
> Stamatis
>
> [1]
> https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>
> HIVE-11997: Add ability to send Compaction Jobs to specific queue
> HIVE-13354: Add ability to specify Compaction options per table and per
> request
> HIVE-20723: Allow per table specification of compaction yarn queue
> HIVE-24781: Allow to use custom queue for query based compaction
> HIVE-25801: Custom queue settings is not honoured by Query based
> compaction StatsUpdater
> HIVE-25595: Custom queue settings is not honoured by compaction
> StatsUpdater
>


[jira] [Created] (HIVE-25909) Add test for 'hive.default.nulls.last' property for windows with ordering

2022-01-28 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25909:
---

 Summary: Add test for 'hive.default.nulls.last' property for 
windows with ordering
 Key: HIVE-25909
 URL: https://issues.apache.org/jira/browse/HIVE-25909
 Project: Hive
  Issue Type: Test
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Add a test around "hive.default.nulls.last" configuration property and its 
interaction with order by clauses within windows.

The property is known to respect such properties:

 
||hive.default.nulls.last||ASC||DESC||
|true|NULL LAST|NULL FIRST|
|false|NULL FIRST|NULL LAST|

 

 

The test can be based along the line of the following examples:
{noformat}
-- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC
set hive.default.nulls.last;

OUT:
hive.default.nulls.last=true

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     NULL    1
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5

-- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC
set hive.default.nulls.last=false;

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1987-05-10 00:00:00     NULL    5

-- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we 
expect NULLS_LAST
set hive.default.nulls.last=false;

SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC 
NULLS LAST)
FROM test1;

OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1987-05-10 00:00:00     NULL    5

-- we have explicit NULLS_FIRST, we expect NULLS_FIRST
SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC 
NULLS FIRST)
FROM test1;

--OUT:
John Doe        1990-05-10 00:00:00     2022-01-10 00:00:00     1
John Doe        1990-05-10 00:00:00     2021-12-10 00:00:00     2
John Doe        1990-05-10 00:00:00     2021-11-10 00:00:00     3
John Doe        1990-05-10 00:00:00     2021-10-10 00:00:00     4
John Doe        1990-05-10 00:00:00     2021-09-10 00:00:00     5
John Doe        1987-05-10 00:00:00     NULL    1
John Doe        1987-05-10 00:00:00     2022-01-10 00:00:00     2
John Doe        1987-05-10 00:00:00     2021-12-10 00:00:00     3
John Doe        1987-05-10 00:00:00     2021-11-10 00:00:00     4
John Doe        1987-05-10 00:00:00     2021-10-10 00:00:00     5{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25905) ORDER BY colName DESC does not honour 'hive.default.nulls.last' property

2022-01-27 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25905:
---

 Summary: ORDER BY colName DESC does not honour 
'hive.default.nulls.last' property
 Key: HIVE-25905
 URL: https://issues.apache.org/jira/browse/HIVE-25905
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Consider the following table and data:
{noformat}
create table test1
(
  a string,
  b timestamp,
  c timestamp
);

INSERT INTO TABLE test1 VALUES
('John Doe', '1990-05-10 00:00:00.0', '2022-01-10 00:00:00.0'),
('John Doe', '1990-05-10 00:00:00.0', '2021-12-10 00:00:00.0'),
('John Doe', '1990-05-10 00:00:00.0', '2021-11-10 00:00:00.0'),
('John Doe', '1990-05-10 00:00:00.0', '2021-10-10 00:00:00.0'),
('John Doe', '1990-05-10 00:00:00.0', '2021-09-10 00:00:00.0'),
('John Doe', '1987-05-10 00:00:00.0', '2022-01-10 00:00:00.0'),
('John Doe', '1987-05-10 00:00:00.0', '2021-12-10 00:00:00.0'),
('John Doe', '1987-05-10 00:00:00.0', '2021-11-10 00:00:00.0'),
('John Doe', '1987-05-10 00:00:00.0', '2021-10-10 00:00:00.0'),
('John Doe', '1987-05-10 00:00:00.0', null);{noformat}
Consider also the following query:
{noformat}
SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC) 
FROM test1; 
{noformat}
The output is:
{noformat}
John Doe10/05/1990 00:0010/01/2022 00:001
John Doe10/05/1990 00:0010/12/2021 00:002
John Doe10/05/1990 00:0010/11/2021 00:003
John Doe10/05/1990 00:0010/10/2021 00:004
John Doe10/05/1990 00:0010/09/2021 00:005
John Doe10/05/1987 00:00NULL1
John Doe10/05/1987 00:0010/01/2022 00:002
John Doe10/05/1987 00:0010/12/2021 00:003
John Doe10/05/1987 00:0010/11/2021 00:004
John Doe10/05/1987 00:0010/10/2021 00:005{noformat}
While the expected output should be:
{noformat}
John Doe10/05/1990 00:0010/01/2022 00:001
John Doe10/05/1990 00:0010/12/2021 00:002
John Doe10/05/1990 00:0010/11/2021 00:003
John Doe10/05/1990 00:0010/10/2021 00:004
John Doe10/05/1990 00:0010/09/2021 00:005
John Doe10/05/1987 00:0010/01/2022 00:001
John Doe10/05/1987 00:0010/12/2021 00:002
John Doe10/05/1987 00:0010/11/2021 00:003
John Doe10/05/1987 00:0010/10/2021 00:004
John Doe10/05/1987 00:00NULL5{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25888) Improve RuleEventLogger to also print input rels in FULL_PLAN mode

2022-01-21 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25888:
---

 Summary: Improve RuleEventLogger to also print input rels in 
FULL_PLAN mode
 Key: HIVE-25888
 URL: https://issues.apache.org/jira/browse/HIVE-25888
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Hive porting of CALCITE-4991, refer to that ticket for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25884) Improve rule description for subclasses of rules

2022-01-20 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25884:
---

 Summary: Improve rule description for subclasses of rules
 Key: HIVE-25884
 URL: https://issues.apache.org/jira/browse/HIVE-25884
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Consider the instances of _HivePointLookupOptimizerRule_ (for joins, filters 
and projects). 

They use the [default 
constructor|https://github.com/apache/calcite/blob/0065d7c179b98698f018f83b0af0845a6698fc54/core/src/main/java/org/apache/calcite/plan/RelOptRule.java#L79]
 for _RelOptRule_, which builds the rule description from the class name, and 
in case of nested classes, it takes only the inner class name.

In this case, the names do not refer to _HivePointLookupOptimizerRule_ and are 
too generic (e.g.,_FilerCondition_), it's hard to link them back to the rule 
they belong to without looking at the source code.

This is particularly problematic now that we have more detailed logging for CBO 
(see [HIVE-25816|https://issues.apache.org/jira/browse/HIVE-25816]), where rule 
descriptions are printed.

The aim of the PR is to improve the rule description by passing an explicit 
string whenever the rule (class) name alone is not enough.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25880) Configuration option to exclude rule by a regex on their description

2022-01-19 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25880:
---

 Summary: Configuration option to exclude rule by a regex on their 
description
 Key: HIVE-25880
 URL: https://issues.apache.org/jira/browse/HIVE-25880
 Project: Hive
  Issue Type: New Feature
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Introduce a configuration option to exclude rules via a regex on the rule 
description, based on Calcite's 
[org.apache.calcite.plan.AbstractRelOptPlanner#setRuleDescExclusionFilter|https://github.com/apache/calcite/blob/0065d7c179b98698f018f83b0af0845a6698fc54/core/src/main/java/org/apache/calcite/plan/AbstractRelOptPlanner.java#L186].

The motivation is to provide a quick workaround when one or more rules are 
causing issues at planning time, without code changes.

Another use would be to quickly experiment on the impact of disabling one or 
more rules on the compute plan.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25870) Make HivePointLookupOptimizerRule to just convert OR to IN, no simplification

2022-01-17 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25870:
---

 Summary: Make HivePointLookupOptimizerRule to just convert OR to 
IN, no simplification
 Key: HIVE-25870
 URL: https://issues.apache.org/jira/browse/HIVE-25870
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


_HivePointLookupOptimizerRule_ has been introduced to improve simplifications 
and improve statistics/estimations (see 
https://issues.apache.org/jira/browse/HIVE-11424?focusedCommentId=15197407=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15197407).

At the time, _RexSimplify_ could not simplify such OR/AND expressions (as 
reported in the JIRA above):

{noformat}
simplify(unknown as unknown): AND(true, OR(=(?0.int0, 1), =(?0.int0, 2), 
=(?0.int0, 3)), OR(AND(true, =(?0.int0, 1)), AND(true, =(?0.int0, 2
{noformat}

For Calcite <= 1.25, simplification is still missed:
{noformat}
Expected: "OR(?0.int0=1, ?0.int0=2)"
 but: was "AND(OR(=(?0.int0, 1), =(?0.int0, 2), =(?0.int0, 3)), 
OR(=(?0.int0, 1), =(?0.int0, 2)))"
{noformat}

>From Calcite >= 1.26, the simplifications happens:
{noformat}
Expected: "OR(?0.int0=1, ?0.int0=2)"
 but: was "SEARCH(?0.int0, Sarg[1, 2])"
{noformat}

For this reason, we could drop all the simplifications within the rule, just 
keep the OR -> IN conversion, and move the rule to the very last planning stage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25852) Introduce IN clauses at the very end of query planning

2022-01-07 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25852:
---

 Summary: Introduce IN clauses at the very end of query planning
 Key: HIVE-25852
 URL: https://issues.apache.org/jira/browse/HIVE-25852
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


Calcite "explodes" IN clauses into the equivalent OR form, and therefore it 
does not handle such clauses in most of the codebase (notably in _RexSimplify_).

In Hive, the same happens, but _HivePointLookupOptimizerRule_ re-introduces IN 
clauses, and it happens in _applyPreJoinOrderingTransforms_ phase, which is 
pretty early and which mixes several other rules which might not fully support 
IN (notably, _HiveReduceExpressionsRule_ which is based on _RexSimplify_).

The problem will become even harder in later versions of Calcite (current is 
1.25) based on SARG, which does not support IN clauses.

IN clauses can be converted into efficient runtime operators, we therefore want 
to keep them in the final plan, intuitively we just want this translation to 
happen in a later step, in order to leave the rest of the codebase (Hive and 
Calcite) unaware of IN clauses.

The goal of the ticket is as follows:
# re-convert the output expression of _HivePointLookupOptimizerRule_ into the 
OR form (keep the logic as-is to benefit from the rule)
# add a rule, in the last step of the planning process, that only converts 
eligible OR expressions into IN clauses



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25851) Replace HiveRelMdPredicate with RelMdPredicate from Calcite

2022-01-07 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25851:
---

 Summary: Replace HiveRelMdPredicate with RelMdPredicate from 
Calcite
 Key: HIVE-25851
 URL: https://issues.apache.org/jira/browse/HIVE-25851
 Project: Hive
  Issue Type: Improvement
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


`HiveRelMdPredicates` was copied from `RelMdPredicates` in Calcite long ago, it 
has few differences which could be ported to the Calcite version, if needed.

The goal of the ticket is to:
# ascertain which are the additional features in `HiveRelMdPredicates`, port 
them to Calcite if needed
# drop `HiveRelMdPredicates` in favour of `RelMdPredicates` in order to benefit 
from all the advances in such class




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25766) java.util.NoSuchElementException in HiveFilterProjectTransposeRule if predicate has no InputRef

2021-12-02 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25766:
---

 Summary: java.util.NoSuchElementException in 
HiveFilterProjectTransposeRule if predicate has no InputRef
 Key: HIVE-25766
 URL: https://issues.apache.org/jira/browse/HIVE-25766
 Project: Hive
  Issue Type: Bug
  Components: CBO, Query Planning
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


The issue can be reproduced with the following query:
{code:java}
create table test1 (s string);
create table test2 (m int);

EXPLAIN
SELECT c.m
FROM (
  SELECT cast(substr(from_unixtime(unix_timestamp(), '-MM-dd'), 1, 1) AS 
int) as m
  FROM test1
  WHERE cast(substr(from_unixtime(unix_timestamp(), '-MM-dd'), 1, 1) AS 
int) = 2) c
JOIN test2 d ON c.m = d.m; {code}
It fails with the following exception:
{noformat}
 java.util.NoSuchElementException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1447)
    at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.check(HiveFilterProjectTransposeRule.java:348)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.visit(HiveFilterProjectTransposeRule.java:306)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.visit(HiveFilterProjectTransposeRule.java:303)
    at org.apache.calcite.rel.SingleRel.childrenAccept(SingleRel.java:72)
    at org.apache.calcite.rel.RelVisitor.visit(RelVisitor.java:44)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule$RedundancyChecker.visit(HiveFilterProjectTransposeRule.java:316)
    at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule.isRedundantIsNotNull(HiveFilterProjectTransposeRule.java:276)
    at 
org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveFilterProjectTransposeRule.onMatch(HiveFilterProjectTransposeRule.java:191){noformat}
The current implementation, while checking if the predicate to be transposed is 
redundant or not, it expects at least one InputRef, but the predicate can have 
none as in this case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25758) OOM due to recursive application CBO rules

2021-12-01 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25758:
---

 Summary: OOM due to recursive application CBO rules
 Key: HIVE-25758
 URL: https://issues.apache.org/jira/browse/HIVE-25758
 Project: Hive
  Issue Type: Bug
  Components: CBO, Query Planning
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


 

Reproducing query is as follows:
{code:java}
create table test1 (act_nbr string);
create table test2 (month int);
create table test3 (mth int, con_usd double);

EXPLAIN
   SELECT c.month,
  d.con_usd
   FROM
 (SELECT 
cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(), 
'-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
  FROM test1
  UNION ALL
  SELECT month
  FROM test2
  WHERE month = 202110) c
   JOIN test3 d ON c.month = d.mth; {code}
 

Different plans are generated during the first CBO steps, last being:
{noformat}
2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main] 
parse.CalcitePlanner: Plan after decorre
lation:
HiveProject(month=[$0], con_usd=[$2])
  HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none], cost=[not 
available])
    HiveProject(month=[$0])
      HiveUnion(all=[true])
        
HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
 _UTF-16LE'-MM-d
d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7), 
_UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-
16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER])
          HiveTableScan(table=[[default, test1]], table:alias=[test1])
        HiveProject(month=[$0])
          HiveFilter(condition=[=($0, CAST(202110):INTEGER)])
            HiveTableScan(table=[[default, test2]], table:alias=[test2])
    HiveTableScan(table=[[default, test3]], table:alias=[d]){noformat}
 

Then, the HEP planner will keep expanding the filter expression with redundant 
expressions, such as the following, where the identical CAST expression is 
present multiple times:

 
{noformat}
rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
 _UTF-16LE'-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 
7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, 
CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP, 
_UTF-16LE'-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 
7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", 
_UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER, 
202110)){noformat}
 

The problem seems to come from a bad interaction of at least 
_HiveFilterProjectTransposeRule_ and 
{_}HiveJoinPushTransitivePredicatesRule{_}, possibly more.

Most probably then UNION part can be removed and the reproducer be simplified 
even further.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25749) Check if RelMetadataQuery.collations() return null to avoid NPE

2021-11-29 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25749:
---

 Summary: Check if RelMetadataQuery.collations() return null to 
avoid NPE
 Key: HIVE-25749
 URL: https://issues.apache.org/jira/browse/HIVE-25749
 Project: Hive
  Issue Type: Bug
  Components: CBO, Query Planning
Affects Versions: 4.0.0
Reporter: Alessandro Solimando
Assignee: Alessandro Solimando


Accoring to "RelMetadataQuery.collations()" 
[javadoc|https://github.com/apache/calcite/blob/calcite-1.25.0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L537],
 the method can return "null" if collactions information are not available.

Hive invokes the method in two places 
([RelFieldTrimmer|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java#L192]
 and 
[HiveJoin|https://github.com/apache/hive/blob/1046f41ea36ab3c8b036481128ba9b76dda2882a/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveJoin.java#L206]),
 but it does not check for "null" return values, which can cause NPE.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25734) Wrongly-typed constant in case expression leads to incorrect empty result

2021-11-23 Thread Alessandro Solimando (Jira)
Alessandro Solimando created HIVE-25734:
---

 Summary: Wrongly-typed constant in case expression leads to 
incorrect empty result
 Key: HIVE-25734
 URL: https://issues.apache.org/jira/browse/HIVE-25734
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 4.0.0
Reporter: Alessandro Solimando


 

The type of constants in case expressions should be inferred, if possible, by 
the "surrounding" input reference columns, if any.

Consider the following table and query: 
{code:java}
create external table test_case (row_seq smallint, row_desc string) stored as 
parquet;
insert into test_case values (1, 'a');
insert into test_case values (2, 'aa');
insert into test_case values (6, 'aa');

with base_t as (select row_seq, row_desc,
  case row_seq
when 1 then '34'
when 6 then '35'
when 2 then '36'
  end as zb from test_case where row_seq in (1,2,6))
select row_seq, row_desc, zb from base_t where zb <> '34';{code}
The aforementioned query fails by returning an empty results, while "1 a 34" is 
expected.

 

To understand the root cause, let's consider the debug input and output of some 
related CBO rules which are triggered during the evaluation of the query: 

 
{noformat}
--$0 is the column 'row_seq'
1. HiveReduceExpressionsRule
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 
1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), 
'36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
=($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
2. HivePointLookupOptimizerRule.RexTransformIntoInClause
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), 
=($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
3. HivePointLookupOptimizerRule.RexMergeInClause
Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 
2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
Output: false{noformat}
In the first part, we can see that the constants are correctly typed as 
"SMALLINT" in the first part of the "AND" operand, while they are typed as 
"INTEGER" for the "CASE" expression, despite the input reference "$0" being 
available for inferring a more precise type.

This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" 
missing the commonality between the two "IN" expressions, whose intersection is 
considered empty, hence the empty result.

Providing a more refined type inference for "case" expressions should fix the 
issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Commit message guidelines

2021-09-28 Thread Alessandro Solimando
Hi Stamatis,
thanks for the suggestions, I think they are reasonable, the project would
benefit from their adoption.

Regarding the removal of contributor/reviewers names from the commit
message favoring the use of git metadata, there has been a similar discussion
in Calcite ML

which
led to consensus.

Best regards,
Alessandro

On Fri, 24 Sept 2021 at 16:40, Stamatis Zampetakis 
wrote:

> Hi all,
>
> I think we all more or less follow some standard pattern when committing in
> Hive but with some small effort we could make things more uniform and
> hopefully better.
> I would like to start a discussion about creating some guidelines, which we
> could put to the wiki or in contributing.md, to improve the quality of our
> history (git log).
> I outline some suggestions below to kick off the discussion. Many things in
> the list are minor (and maybe even personal preferences) but one thing
> which is really missing from the project is B3 especially the *why* part.
> Why is the commit necessary? Why has the change been made?.
> In some cases the why part is also missing from the JIRA making the code
> harder to maintain.
>
> Subject line:
> S1. Start with the Jira id capitalized and followed immediately (no space)
> by double colon (:)
> S2. Leave one space after the Jira id, and start the summary with a capital
> letter
> S3. Keep it concise (ideally less than 72 characters) and provide a useful
> description of the change
> S4. Do not include or end the line with period
> S5. Do not include the pull request id in the summary
> S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added
> a handler …”) or present tense (“Adds a handler …”)
> S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to
> describe the bug (“NullPointerException if user is unknown”) and people
> will correctly surmise that the purpose of your change is to fix the bug.
> S8. Do not add a contributor's name; the author tag is made exactly for
> this and can be explored/parsed much more efficiently by tools/people for
> stats or other purposes
> S9. Do not add reviewers name; information is present in multiple places
> (e.g., committer tag, PR, JIRA)
>
> Message body: (Trivial changes may not require a body)
> B1. Separate subject from body with a blank line
> B2. Wrap the body at 72 characters
> B3. Use the body to explain what and why vs. how
> Example
> "Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and
> Converter to avoid executing the fallback method which in many cases
> returns null and can cause NPE when this value propagates up the call
> stack."
> vs.
> "Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
> and Converter"
> B4. If multiple authors include them using the standard GitHub marker,
> "Co-authored-by:", followed by the name and email of the author (e.g.,
> Co-authored-by: Marton Bod )
> B5. If the reviewer is different from committer (or merge via GitHub UI)
> use "Reviewed-by:" followed by the name and email of the reviewer (e.g.,
> Reviewed-by: Stamatis Zampetakis )
> B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many
> times as authors/reviewers.
> B7. Include the PR id at the end of the message (e.g., Closes #2514);
> someone can easily navigate back to the PR to check comments, reviewers,
> etc.
>
> A sample commit message following these guidelines is shown below:
>
> commit de7781f29f82083fe01274b4d436b52920a89173
> Author: Soumyakanti Das 
> Commit: Stamatis Zampetakis 
>
> HIVE-25354: NPE when estimating row count in external JDBC tables
>
> Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
> and Converter to avoid executing the fallback method which in many
> cases returns null and can cause NPE when this value propagates up the
> call stack.
>
> Co-authored-by: Krisztian Kasa 
>
> Reviewed-by: Peter Vary 
> Reviewed-by: Zoltan Haindrich 
>
> Closes #2514
>
> Let me know your thoughts.
>
> Best,
> Stamatis
>