Re: [Discussion] HIVE-28211: Restore hive-exec:core jar

2024-05-03 Thread Zoltan Haindrich

I think the shading should be fixed instead restoring this core jar.
Providing a core-jar means that we support it and I think that would be a bad 
move:
I believe its an irrational expectation from any project to use the same or 
compatible deps as against hive-exec was compiled!
For example hive-exec uses an ancient guava which was released back in 2017 
https://mvnrepository.com/artifact/com.google.guava/guava/22.0
and has 3 CVEs listed... and that's just one from many deps the core-jar will 
pull into a build.
Also note that guava tends to break api quite frequently - so I guess anyone 
using a bit more recent guava will have a hard time consuming the artifact

Downstream projects have had the opportunity to try and report issues with the 
alpha releases before the 4.0 have came out or not?
If they were not doing that - I think that's not our fault!

Middle ground could be to suggest them to try the shaded hive-exec jar (we still have nightly builds [1]); notify these projects to try it and report back issues - give 
them some time fix up any further shading issues and done.


[1] http://ci.hive.apache.org/job/hive-nightly/

cheers,
Zoltan

On 4/29/24 09:16, Stamatis Zampetakis wrote:
I shared the reasons behind the removal of the jar and my concerns around bringing it back. I'm still not convinced that it's needed but if the rest of the community feels 
that it's the right path forward then I am ok with this.


Best,
Stamatis

On Fri, Apr 26, 2024, 2:42 PM Ayush Saxena mailto:ayush...@gmail.com>> wrote:

Stamatis,
Isn't the removal itself an incompatible change? There are a lot of projects 
using it & we suddenly removed a jar because there were some people not sure 
how to
properly use it and were complaining about it.

What about the projects which are now stuck? reading the thread at [1], 
there were promises made that everything will be relocated and sorted before 
the release, but we
couldn't, AFAIK it isn't a naive task to just relocate all the dependencies.

As I see here @Chao Sun , even raised concerns [2], that the removal just 
stops the way for upgrading downstream projects and it got countered like folks 
chasing the
removal will help chase getting all the dependencies relocated or solve the 
issues for downstream. I think none volunteered.

I would either recommend:
* Best case we relocate all the dependencies present in hive-exec, not just one or 
two. Somebody volunteers to raise one PR relocating "all" and we can commit 
that and
we should be sorted.
* Restore back the core jar, because a lot of projects depend on it, the 
removal itself was incompatible, the removal I don't think had a clear 
community agreement, it
was a conditional agreement, which I don't think got sorted, so we should 
rollback.

On a lighter note, we might release with some 5000+ commits, with best 
performance or so, but if nobody is able to consume those release bits, I think 
those efforts are
just getting waste, eventually people will just stick to their older versions 
and not even try to upgrade & we will be releasing for nobody or maybe for few 
folks who
just have only Hive in their stack (I don't know if there are folks like 
that), No matter how good a product is, if people don't use it, it is gonna die 
:-(


I think we have a ticket which talks about relocating all dependencies, I 
agree we should drop the core jar for sure, it leads to all the problems as 
Stamatis mentioned
but lets restore the core jar back & we can drop it when that relocation 
ticket is resolved. Does that sound convincing, or even worth a thought?

btw. having jars with a set of dependencies shaded and other ones unshaded is 
done in hadoop as well, hadoop-minicluster vs hadoop-client-minicluster & such 
problems by
users keep on coming, eg [3]

Anyone else, any thoughts?

-Ayush

[1] https://lists.apache.org/thread/cwtxnffoqpwgmdtlc9hyor2cm22djpkg 

[2] https://lists.apache.org/thread/23sshgolmbpcc01npqgt03woljdy6hdn 

[3] https://lists.apache.org/thread/f47s6bxrtslkxbc8s2gybwrxps8vk63x 




On Fri, 26 Apr 2024 at 16:37, Stamatis Zampetakis mailto:zabe...@gmail.com>> wrote:

Hey Simhadri, thanks for starting this discussion.

Maven has many limitations when it comes to publishing multiple
artifacts from the same module. In most cases, the end result is
broken and hard to use. The pom file that is published for a given
module is not able to describe correctly all artifacts of the module
and that's why there is one main artifact for every module; dependency
declarations are usually correct for the main artifact but are not
representative for the rest.

For exam

Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Zoltan Haindrich
me, nor any contributor is at
mercy of anyone else, everyone is equal, equal right, have all rights to
make mistakes, discuss, get them corrected & still come back to this place
without any humiliation with his head held high. Community over Code

I think you will retrospect, you are a respected member of the community &
will work in a way it is good & peaceful for all, rest I can't help it.

I will let Stamatis & you or anyone you like go forward with this. Stamatis
has all rights around the repo, if you want any deletion or so, rest INFRA
ticket can get anything sorted, if it requires me to write something, I can
accept it was my bad, fortunately I don't have ego issues :-)

With a heavy heart!!!
-Ayush


On Tue, 23 Jan 2024 at 15:05, Zoltan Haindrich  wrote:



On 1/23/24 10:10, Ayush Saxena wrote:

Ok I will get the repo deleted. I am not taking any sarcastic comments

from Zoltan at this stage. Believe me I am not getting anything for

having

my name there.


Why I did this?

Someone was so obsessed with getting his name checked into the "Apache

Code" that he developed something on his fork & checked in that code to

the

Apache Hive code, so,

professional.


I don't aggree - as this is just not true! This project was created
way-way back to aid my own efforts - didn't even looked like it will be
used like this later.
so I've created a separate repo - as there would have been not much

reason

to wait for approvals on tool changes only I use!
I was using it for years - and when I was replacing the CI I've used it

to

get a good base ground for running the tests - as it could prepare a lot

of

things already.
I had to do a lot of things - and the move of the repo was never at the
top of the list...
...and it never proved to be a bottleneck; as you can just fork&upload a
new image and reference that instead...


Many Hive Commiters have rights is a wrong phrase to quote: Many Hive

Committer who are your friends have rights. To push an image we need to
catch Zoltan, but ok do

whatever you want.


That's not true either as you can see in a week-or-so old comment from

me (



https://github.com/kgyrtkirk/hive-dev-box/pull/15#issuecomment-1893414474)

describing how you
could push a new image and use that instead ; bypassing me entirely...
The PR with that changed docker image to `wecharyu/hive-dev-box:executor`
got merged a few days back...so as of now the hive ci doesn't have much
connection to the original
repo.



I just want to say, Zoltan, you might be a very good developer, but

please change your "whatever you want to do" tone,

I think you went forward and done something before even writing a single
line - or did I miss it?


Not following this further

have it your way!


cheers,
Zoltan





-Ayush

On Tue, 23 Jan 2024 at 14:30, Zoltan Haindrich 
k...@rxd.hu>> wrote:



   > I just copied the repo: cp -R and Put Zoltan's name & reference

to his

   > repo. I didn't knew any better way than that, you can

definitely

force push

   > with another fancy approach

 lol...what a sophisticated approach - I wonder if you don't know

the

`fancy approach` then why you've done it?


 I wonder what you've copied - because you missed the addition of

the

github action which builds the image for every PR


 Now you are the sole contributor of all existing stuff

(congrats)...but do whatever you want...

 It was always there and available to use - many hive commiters had

push and approve rights on those repos.


 I think you might also want to do the same with

https://github.com/kgyrtkirk/hive-toolbox <
https://github.com/kgyrtkirk/hive-toolbox>

 because your contribution references it here:



https://github.com/apache/hive-dev-box/blob/663625bc74e799f35c6bab1c1485530367287c61/tools/install_toolbox#L21C1-L21C115

 <



https://github.com/apache/hive-dev-box/blob/663625bc74e799f35c6bab1c1485530367287c61/tools/install_toolbox#L21C1-L21C115


 and probably also cp -R
 https://github.com/kgyrtkirk/hive-test-kube/ <

https://github.com/kgyrtkirk/hive-test-kube/>


 cheers,
 Zoltan


 On 1/23/24 09:29, Ayush Saxena wrote:
  > I just copied the repo: cp -R and Put Zoltan's name & reference

to his

  > repo. I didn't knew any better way than that, you can definitely

force push

  > with another fancy approach, just c-pick the other commits for

NOTICE & all

  > on top of it. The old code & commits had some cloudera

references, which I

  > personally wanted to avoid, but yep we can take another approach

as well.

  > Good with me.
  >
  > For the Jira, yep we should, we aren't going to release this, so

for fix

  > version, maybe I will create a dev-box-1.0.0 which we can use to

resolve

  > the tickets, sho

Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Zoltan Haindrich


On 1/23/24 10:10, Ayush Saxena wrote:

Ok I will get the repo deleted. I am not taking any sarcastic comments from 
Zoltan at this stage. Believe me I am not getting anything for having my name 
there.

Why I did this?

Someone was so obsessed with getting his name checked into the "Apache Code" that he developed something on his fork & checked in that code to the Apache Hive code, so, 
professional.


I don't aggree - as this is just not true! This project was created way-way 
back to aid my own efforts - didn't even looked like it will be used like this 
later.
so I've created a separate repo - as there would have been not much reason to 
wait for approvals on tool changes only I use!
I was using it for years - and when I was replacing the CI I've used it to get 
a good base ground for running the tests - as it could prepare a lot of things 
already.
I had to do a lot of things - and the move of the repo was never at the top of 
the list...
...and it never proved to be a bottleneck; as you can just fork&upload a new 
image and reference that instead...

Many Hive Commiters have rights is a wrong phrase to quote: Many Hive Committer who are your friends have rights. To push an image we need to catch Zoltan, but ok do 
whatever you want.


That's not true either as you can see in a week-or-so old comment from me (https://github.com/kgyrtkirk/hive-dev-box/pull/15#issuecomment-1893414474) describing how you 
could push a new image and use that instead ; bypassing me entirely...
The PR with that changed docker image to `wecharyu/hive-dev-box:executor` got merged a few days back...so as of now the hive ci doesn't have much connection to the original 
repo.




I just want to say, Zoltan, you might be a very good developer, but please change your 
"whatever you want to do" tone,


I think you went forward and done something before even writing a single line - 
or did I miss it?


Not following this further

have it your way!


cheers,
Zoltan





-Ayush

On Tue, 23 Jan 2024 at 14:30, Zoltan Haindrich mailto:k...@rxd.hu>> wrote:


  > I just copied the repo: cp -R and Put Zoltan's name & reference to his
  > repo. I didn't knew any better way than that, you can definitely force 
push
  > with another fancy approach

lol...what a sophisticated approach - I wonder if you don't know the `fancy 
approach` then why you've done it?

I wonder what you've copied - because you missed the addition of the github 
action which builds the image for every PR

Now you are the sole contributor of all existing stuff (congrats)...but do 
whatever you want...
It was always there and available to use - many hive commiters had push and 
approve rights on those repos.

I think you might also want to do the same with 
https://github.com/kgyrtkirk/hive-toolbox 
<https://github.com/kgyrtkirk/hive-toolbox>
because your contribution references it here: 
https://github.com/apache/hive-dev-box/blob/663625bc74e799f35c6bab1c1485530367287c61/tools/install_toolbox#L21C1-L21C115

<https://github.com/apache/hive-dev-box/blob/663625bc74e799f35c6bab1c1485530367287c61/tools/install_toolbox#L21C1-L21C115>
and probably also cp -R
https://github.com/kgyrtkirk/hive-test-kube/ 
<https://github.com/kgyrtkirk/hive-test-kube/>

cheers,
Zoltan


On 1/23/24 09:29, Ayush Saxena wrote:
 > I just copied the repo: cp -R and Put Zoltan's name & reference to his
 > repo. I didn't knew any better way than that, you can definitely force 
push
 > with another fancy approach, just c-pick the other commits for NOTICE & 
all
 > on top of it. The old code & commits had some cloudera references, which 
I
 > personally wanted to avoid, but yep we can take another approach as well.
 > Good with me.
 >
 > For the Jira, yep we should, we aren't going to release this, so for fix
 > version, maybe I will create a dev-box-1.0.0 which we can use to resolve
 > the tickets, shouldn't put main repo versions, else that will pop up in 
our
 > release notes, or let me know if you want a separate Jira project under
 > Hive for these repos as well, We can explore that route if folks feel 
that
 > way.
 >
 > -Ayush
 >
 >
 > On Tue, 23 Jan 2024 at 13:35, Stamatis Zampetakis mailto:zabe...@gmail.com>> wrote:
 >
 >> Thanks for helping advance this Ayush!
 >>
 >> I saw that the commit history was not retained. Is there any reason
 >> for dropping it? Keeping the history and the people who contributed
 >> thus far would be nice to have.
 >>
 >> For the contribution model to this repository, I would recommend the
 >> usual process. Raise a JIRA ticket, file a PR, 

Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2024-01-23 Thread Zoltan Haindrich
since the code will be under the ASF namespace people will assume

that

it is ASF licensed so they may start copy-pasting stuff from

there.


Is there anything preventing us from putting the code under the

AL2

license?


Best,
Stamatis

On Wed, Aug 23, 2023 at 6:14 PM Attila Turoczy
 wrote:


Thank you, Stamatis! Also, Zoltan for the "donation" :)

-Attila

On Wed, Aug 23, 2023 at 4:53 PM Ayush Saxena <

ayush...@gmail.com>

wrote:



+1,
Thanx Stamatis foe initiating this. This was something which

was

in my

mind as well since long but couldn’t find time.

-Ayush


On 23-Aug-2023, at 6:19 PM, Zoltan Haindrich 

wrote:


Hey Stamatis!

I'm happy to donate these repos / help with the migration!
I should have done it earlier - but it was never top

priority...thank

you for initiating it!


cheers,
Zoltan


On 8/23/23 14:00, Stamatis Zampetakis wrote:
Hi all,
Our precommit infrastructure uses code that resides in the

following

repos.

* https://github.com/kgyrtkirk/hive-test-kube
* https://github.com/kgyrtkirk/hive-toolbox
* https://github.com/kgyrtkirk/hive-dev-box
These are mainly maintained by Zoltán Haindrich who is

always

helpful

and kind to investigate and resolve issues.
For facilitating contributions from the apache community

and

also

removing some burden from Zoltan's shoulders it may be a

good

time to

migrate those and put them under the apache namespace.
For the initial migration, we could have a straightforward

1

to 1

mapping as shown below:
* https://github.com/apache/hive-test-kube
* https://github.com/apache/hive-toolbox
* https://github.com/apache/hive-dev-box
How do you feel about this?
Best,
Stamatis












OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Migrate precommit git repos from kgyrtkirk to apache

2023-08-23 Thread Zoltan Haindrich

Hey Stamatis!

I'm happy to donate these repos / help with the migration!
I should have done it earlier - but it was never top priority...thank you for 
initiating it!

cheers,
Zoltan

On 8/23/23 14:00, Stamatis Zampetakis wrote:

Hi all,

Our precommit infrastructure uses code that resides in the following repos.

* https://github.com/kgyrtkirk/hive-test-kube
* https://github.com/kgyrtkirk/hive-toolbox
* https://github.com/kgyrtkirk/hive-dev-box

These are mainly maintained by Zoltán Haindrich who is always helpful
and kind to investigate and resolve issues.

For facilitating contributions from the apache community and also
removing some burden from Zoltan's shoulders it may be a good time to
migrate those and put them under the apache namespace.

For the initial migration, we could have a straightforward 1 to 1
mapping as shown below:

* https://github.com/apache/hive-test-kube
* https://github.com/apache/hive-toolbox
* https://github.com/apache/hive-dev-box

How do you feel about this?

Best,
Stamatis


OpenPGP_signature
Description: OpenPGP digital signature


Re: Admin privileges on http://ci.hive.apache.org/

2023-08-16 Thread Zoltan Haindrich

Hey,

I think ideally all team members (https://github.com/orgs/apache/teams/hive-committers) should be admins; but I was facing some issue when setting it up initially - or made 
some mistakes...don't remember.

but there are quite a few people with admin rights who can add you as well 
(I've already added you).

cheers,
Zoltan


On 8/15/23 17:18, Stamatis Zampetakis wrote:

Hey all,

I was wondering who has admin privileges for http://ci.hive.apache.org/ ?

I would like to check and potentially upgrade some plugins that are
currently installed. Is it possible to get permissions to manage the
Jenkins instance?

As a side note we may want to document the current administrators and the
process to get permissions when necessary.


Best,
Stamatis



OpenPGP_signature
Description: OpenPGP digital signature


Hive CI / github user change

2023-08-10 Thread Zoltan Haindrich

Hey,

For a long time hive-ci was adding comments/etc under my username on every PR; 
I've worked with infra(INFRA-24854) to get a PAT for one of their own users.
I don't see any issues arising from this change - but wanted to let you know :)

cheers,
Zoltan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Idea: Remove PowerMock

2023-07-11 Thread Zoltan Haindrich

Hey,

#3798 looks promising; I've reopened - it seems like it have fallen between the 
cracks...
testruns have become outdated and it seems like it needs a rebase.

I think that the usage of powermock signals that something is wrong and a 
refactor step should be taken instead of introducing it
iirc it could even interfere with jvm reuse for test executions - and thus 
could cause some confusion.

cheers,
Zoltan


On 7/10/23 18:56, Attila Turoczy wrote:

+1 Kill it! :)
mockito is a more modern approach. I think it is cool that we modernize our
platform, and remove old and unsupported tools and components.


On Mon, Jul 10, 2023 at 5:36 PM Ayush Saxena  wrote:


+1, PowerMock as far as I remember has issues with JDK-11+ as well,
one such ref :
https://stackoverflow.com/questions/52966897/powermock-java-11

-Ayush

On Mon, 10 Jul 2023 at 20:18, Zsolt Miskolczi 
wrote:


Hi,

Hive heavily uses PowerMock . The main
purpose of it is having static mocking.

The sad thing is it seems PowerMock is dead:
- The main branch got it's lot commit in 2022 and and most of the
contributions last year were simple dependency upgrades:
https://github.com/powermock/powermock/commits/release/2.x
- The last release was in 2020
- And their mailing list looks dead as well. That is the last email on

that

list: https://groups.google.com/g/powermock/c/JdYY3naZlbU. It asked if

it

was discontinued and didn't get an answer at all.

So officially, it is not dead but it seems it is.

Back then when PowerMock development started, there were no static

mocking

in mockito. But since then, it is possible using mockito-inline.

I won't lie, it is hard to switch from PowerMock: it enables some coding
patterns that are considered bad patterns and it leads to code that is
harder to test. Last year I played with it and removed it from the
hive-exec module: https://github.com/apache/hive/pull/3798.

The hard part in removing it is that PowerMock and mockito-inline don't
work together. So when we want to remove it, we have to do it in one pull
request for a given module. It cannot be separated into smaller steps.
The good news is as it relates to testing, pre commit tests can validate
the refactor.

What do you think? Should we move away from PowerMock or keep it as it

is?


Thank you,
Zsolt Miskolczi






OpenPGP_signature
Description: OpenPGP digital signature


Re: [DISCUSS] Automatic rerunning of failed tests in Hive Pre-commit

2023-06-14 Thread Zoltan Haindrich

Hive has >24hours of tests - in case of automated reruns... I wonder how a 
patch which breaks almost all tests should be handled?

I believe we already have a process to step up against these things: if you 
encounter a flaky test - it should be checked&disabled by using:
http://ci.hive.apache.org/job/hive-flaky-check/
..and/or fix the underlying issue...
we had a discussion about it on the mailing list a long time ago.

I see that quite a few flaky tests have creeped in...
http://ci.hive.apache.org/job/hive-precommit/job/PR-4372/
most of these should be fixed...or cleared off the radar


cheers,
Zoltan


OpenPGP_signature
Description: OpenPGP digital signature


Re: [DISCUSS] Nightly snaphot builds

2023-05-26 Thread Zoltan Haindrich

On 5/25/23 19:58, vihang karajgaonkar wrote:

I just tried the job and it worked as expected. Thanks! If I understand
correctly, the job retains builds for 180 days. Does it mean if there were
no commits to a branch for more than 180 days, we will lose the build
artifacts eventually?


not entirely - the removal of old builds is a post-build action; which means - 
if there are no builds; the removal logic will never run
https://plugins.jenkins.io/discard-old-build/

on the other hand I wonder how much value a nightly build can still provide 
after 180 days :)
preferably - a real release should be done after some time :)

cheers,
Zoltan



On Thu, May 25, 2023 at 1:50 AM Zoltan Haindrich  wrote:


Hey Vihang,

I've added you as an admin; and I've copied the job as
http://ci.hive.apache.org/job/hive-nightly-branch-3/
other option could be to trigger the original job or use
parameterized-scheduler  but that would configure a real unconditional
nightly build - which will just build the
same version over-and-over again if there are no changes...
...the current nighly is SCM triggered ; but only once-a-day it makes a
check which creates the desired results.

the least painfull was to copy the job; I guess no-one touched the
pipeline script ever since it was introduced :D

cheers,
Zoltan

On 5/25/23 01:26, vihang karajgaonkar wrote:

I created https://issues.apache.org/jira/browse/HIVE-27371 to have

nightly

builds for branch-3. Once that is merged, I think we can have scheduled
builds for branch-3 as well. Although, I don't have permissions to

create a

new job for branch-3. Does anyone know how to do it?

Thanks,
Vihang

On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar <

vihan...@apache.org>

wrote:


The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great.

Can

we have this for branch-3 as well since we have been backporting a lot

of

PRs to branch-3 lately.

Thanks,
Vihang





On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich  wrote:


Hey,

   > We already have nightly builds for Hive [1].
   > [1] http://ci.hive.apache.org/job/hive-nightly/

...and hive-dev-box can launch such archives; either by using it like
this:
https://www.mail-archive.com/dev@hive.apache.org/msg142420.html

or with a somewhat longer command you could launch hdb in bazaar mode;
and have an HS2 running with a nightly version:

docker run --rm -d -p 1:1 -v hive-dev-box_work:/work -e
HIVE_VERSION=


http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz

--name hive
kgyrtkirk/hive-dev-box:bazaar

cheers,
Zoltan

On 5/24/23 09:15, Stamatis Zampetakis wrote:

Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <

vihan...@apache.org> wrote:


I think there are many benefits like others in this thread suggested

which

can be built on top of nightly builds. Having docker images is great

but

for now I think we can start simple and publish the jars. Many users

still

just deploy using jars and it would be useful to them. Once we have a
docker environment we can add a docker image too to the nightly

builds

so

that users can choose their preferred way.

On Mon, May 22, 2023 at 11:07 PM Sungwoo Park 

wrote:



I think such nightly builds will be useful for testing and debugging

in the

future.

I also wonder if we can somehow create builds even from previous

commits

(e.g., for the past few years). Such builds from previous commits

don't

have to be daily builds, and I think weekly builds (or even monthly

builds)

would also be very useful.

The reason I wish such builds were available is to facilitate

debugging and

testing. When tested against the TPC-DS benchmark, the current

master

branch has several correctness problems that were introduced after

the

release of Hive 3.1.2. We have reported all problems known to us in

[1] and

also submitted several patches. If such nightly builds had been

available,

we would have saved quite a bit of time for implementing the patches

by

quickly finding offending commits that introduced new correctness

bugs.


In addition, you can find quite a few commits in the master branch

that

report bugs which are not reproduced in Hive 3.1.2. Examples:

HIVE-19990,

HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
HIVE-25170, HIVE-25864, HIVE-26671.
(There may be some errors in this list because we compared against

Hive

3.1.2 with many patches backported.) Such nightly builds can be

useful for

finding root causes of such bugs.

Ideally I wish there was an automated procedure to create nightly

builds,

run TPC-DS benchmark, and report correctness/performance results,

although

this would be quite hard to im

Re: [DISCUSS] Nightly snaphot builds

2023-05-25 Thread Zoltan Haindrich

Hey Vihang,

I've added you as an admin; and I've copied the job as 
http://ci.hive.apache.org/job/hive-nightly-branch-3/
other option could be to trigger the original job or use parameterized-scheduler  but that would configure a real unconditional nightly build - which will just build the 
same version over-and-over again if there are no changes...

...the current nighly is SCM triggered ; but only once-a-day it makes a check 
which creates the desired results.

the least painfull was to copy the job; I guess no-one touched the pipeline 
script ever since it was introduced :D

cheers,
Zoltan

On 5/25/23 01:26, vihang karajgaonkar wrote:

I created https://issues.apache.org/jira/browse/HIVE-27371 to have nightly
builds for branch-3. Once that is merged, I think we can have scheduled
builds for branch-3 as well. Although, I don't have permissions to create a
new job for branch-3. Does anyone know how to do it?

Thanks,
Vihang

On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar 
wrote:


The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great. Can
we have this for branch-3 as well since we have been backporting a lot of
PRs to branch-3 lately.

Thanks,
Vihang





On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich  wrote:


Hey,

  > We already have nightly builds for Hive [1].
  > [1] http://ci.hive.apache.org/job/hive-nightly/

...and hive-dev-box can launch such archives; either by using it like
this:
https://www.mail-archive.com/dev@hive.apache.org/msg142420.html

or with a somewhat longer command you could launch hdb in bazaar mode;
and have an HS2 running with a nightly version:

docker run --rm -d -p 1:1 -v hive-dev-box_work:/work -e
HIVE_VERSION=
http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
--name hive
kgyrtkirk/hive-dev-box:bazaar

cheers,
Zoltan

On 5/24/23 09:15, Stamatis Zampetakis wrote:

Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <

vihan...@apache.org> wrote:


I think there are many benefits like others in this thread suggested

which

can be built on top of nightly builds. Having docker images is great

but

for now I think we can start simple and publish the jars. Many users

still

just deploy using jars and it would be useful to them. Once we have a
docker environment we can add a docker image too to the nightly builds

so

that users can choose their preferred way.

On Mon, May 22, 2023 at 11:07 PM Sungwoo Park 

wrote:



I think such nightly builds will be useful for testing and debugging

in the

future.

I also wonder if we can somehow create builds even from previous

commits

(e.g., for the past few years). Such builds from previous commits

don't

have to be daily builds, and I think weekly builds (or even monthly

builds)

would also be very useful.

The reason I wish such builds were available is to facilitate

debugging and

testing. When tested against the TPC-DS benchmark, the current master
branch has several correctness problems that were introduced after the
release of Hive 3.1.2. We have reported all problems known to us in

[1] and

also submitted several patches. If such nightly builds had been

available,

we would have saved quite a bit of time for implementing the patches

by

quickly finding offending commits that introduced new correctness

bugs.


In addition, you can find quite a few commits in the master branch

that

report bugs which are not reproduced in Hive 3.1.2. Examples:

HIVE-19990,

HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
HIVE-25170, HIVE-25864, HIVE-26671.
(There may be some errors in this list because we compared against

Hive

3.1.2 with many patches backported.) Such nightly builds can be

useful for

finding root causes of such bugs.

Ideally I wish there was an automated procedure to create nightly

builds,

run TPC-DS benchmark, and report correctness/performance results,

although

this would be quite hard to implement. (I remember Spark implemented

this

procedure in the era of Spark 2, but my memory could be wrong.)

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Tue, May 23, 2023 at 10:44 AM Ayush Saxena 

wrote:



Hi Vihang,
+1, We were even exploring publishing the docker images of the

snapshot

version as well per commit or maybe weekly, so just shoot 2 docker

commands

and you get a Hive cluster running with master code.

Sai, I think to spin up an env via Docker with all these things

should be

doable for sure, but would require someone with real good expertise

with

docker as well as setting up these services with Hive. Obviously, I

am

not

that guy :-)

@Simhadri has a PR which publishes docker images once a release tag

Re: [DISCUSS] Nightly snaphot builds

2023-05-24 Thread Zoltan Haindrich

Hey,

> We already have nightly builds for Hive [1].
> [1] http://ci.hive.apache.org/job/hive-nightly/

...and hive-dev-box can launch such archives; either by using it like this:
https://www.mail-archive.com/dev@hive.apache.org/msg142420.html

or with a somewhat longer command you could launch hdb in bazaar mode; and have 
an HS2 running with a nightly version:

docker run --rm -d -p 1:1 -v hive-dev-box_work:/work -e 
HIVE_VERSION=http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz --name hive 
kgyrtkirk/hive-dev-box:bazaar


cheers,
Zoltan

On 5/24/23 09:15, Stamatis Zampetakis wrote:

Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar  wrote:


I think there are many benefits like others in this thread suggested which
can be built on top of nightly builds. Having docker images is great but
for now I think we can start simple and publish the jars. Many users still
just deploy using jars and it would be useful to them. Once we have a
docker environment we can add a docker image too to the nightly builds so
that users can choose their preferred way.

On Mon, May 22, 2023 at 11:07 PM Sungwoo Park  wrote:


I think such nightly builds will be useful for testing and debugging in the
future.

I also wonder if we can somehow create builds even from previous commits
(e.g., for the past few years). Such builds from previous commits don't
have to be daily builds, and I think weekly builds (or even monthly builds)
would also be very useful.

The reason I wish such builds were available is to facilitate debugging and
testing. When tested against the TPC-DS benchmark, the current master
branch has several correctness problems that were introduced after the
release of Hive 3.1.2. We have reported all problems known to us in [1] and
also submitted several patches. If such nightly builds had been available,
we would have saved quite a bit of time for implementing the patches by
quickly finding offending commits that introduced new correctness bugs.

In addition, you can find quite a few commits in the master branch that
report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
HIVE-25170, HIVE-25864, HIVE-26671.
(There may be some errors in this list because we compared against Hive
3.1.2 with many patches backported.) Such nightly builds can be useful for
finding root causes of such bugs.

Ideally I wish there was an automated procedure to create nightly builds,
run TPC-DS benchmark, and report correctness/performance results, although
this would be quite hard to implement. (I remember Spark implemented this
procedure in the era of Spark 2, but my memory could be wrong.)

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Tue, May 23, 2023 at 10:44 AM Ayush Saxena  wrote:


Hi Vihang,
+1, We were even exploring publishing the docker images of the snapshot
version as well per commit or maybe weekly, so just shoot 2 docker

commands

and you get a Hive cluster running with master code.

Sai, I think to spin up an env via Docker with all these things should be
doable for sure, but would require someone with real good expertise with
docker as well as setting up these services with Hive. Obviously, I am

not

that guy :-)

@Simhadri has a PR which publishes docker images once a release tag is
pushed, you can explore to have similar stuff for the Snapshot version,
maybe if that sounds cool

-Ayush

On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
 wrote:


Hi Vihang,

+1 on the idea.

This is a great idea to quickly test if a certain feature is working as
expected on a certain branch.
This way we test data loss, correctness, or any other unexpected

scenarios

that are Hive specific only. However, I'm wondering if it is possible

to

deploy/test in a kerberized environment or issues involving

authorization

services like sentry/ranger.

Thanks,
Sai.

On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <

vihan...@apache.org>

wrote:


Hello Team,

I have observed that it is a common use-case where users would like

to

test

out unreleased features/bug fixes either to unblock them or test out

if

the

bug fixes really work as intended in their environments. Today in the

case

of Apache Hive, this is not very user friendly because it requires

the

end

user to build the binaries directly from the hive source code.

I found that Apache Spark has a very useful infrastructure [1] which
deploys nightly snapshots [2] [3] from the branch using github

actions.

This is super useful for any user who wants to try out the latest and
greatest using the nightly builds.

I was wondering if we should also adopt this. We can use githu

[jira] [Created] (HIVE-26605) Remove reviewer pattern

2022-10-07 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-26605:
---

 Summary: Remove reviewer pattern
 Key: HIVE-26605
 URL: https://issues.apache.org/jira/browse/HIVE-26605
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Release candence

2022-05-11 Thread Zoltan Haindrich

Hey,


>> In another email thread 
(https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun Chao mentioned 
that  other projects (Spark,
>> Iceberg and Trino/Presto) are still depending on old Hive, because the exec-core jar has been removed, and the exec jar contains unshaded versions of various 
dependencies. Until this is fixed, they can not upgrade to a newer version of Hive, so I would like to add this as a blocker for Hive 4.0.0 release.


>> @Chao Sun: Could you help us find the jira for this issue, or file a new one?

I was thinking about this and I think this is a bit unfair...say project X is using Hive 2.3's core jar; should "we" the Hive community do all the work to run their project 
with Hive 4? I don't think so.

What if some project is not interested in upgrading? Should we really put 
efforts into thing even in that case?

The best middle ground idea I was able to come up so far was to ask for a broken development branch set up to run with some 4.0.0-alpha-X release where we can start fixing 
the shading issues they might face together.
In this case they will be already ready to go upgrading their Hive; and if they also able to run tests/etc: as a bonus we will get early pre-integration feedback(s)...which 
will be valuable for both them and us.


What do you guys think?
Are there any other options?

cheers,
Zoltan

On 5/11/22 7:33 AM, Chao Sun wrote:

Thanks for reminding me, Peter. There is
https://issues.apache.org/jira/browse/HIVE-25317 but that's for Hive
2.3 and is mostly for the Spark use case. I just created
https://issues.apache.org/jira/browse/HIVE-26220 and marked it as a
blocker.

On Tue, May 10, 2022 at 10:01 PM Peter Vary  wrote:


In another email thread 
(https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun Chao 
mentioned that  other projects (Spark,
Iceberg and Trino/Presto) are still depending on old Hive, because the 
exec-core jar has been removed, and the exec jar contains unshaded versions of 
various dependencies. Until this is fixed, they can not upgrade to a newer 
version of Hive, so I would like to add this as a blocker for Hive 4.0.0 
release.

@Chao Sun: Could you help us find the jira for this issue, or file a new one?

Any more blockers?

Thanks,
Peter

On Fri, Apr 29, 2022, 13:46 Peter Vary  wrote:


Hi Team,

With Zoltan Haindrich, we have been brainstorming about the next steps after 
the 4.0.0-alpha-1 release.

We come up with the following plan:
- Define a desired scope for the 4.0.0 release
- Release minimally quarterly - create alpha release(s) until the scope is 
reached
- If the scope is reached - create a beta release
- For fixes - create a beta release
- If we are satisfied with the quality of the release then we can release the 
Hive 4.0.0
- Keep up with the quarterly release cadence

Until now we collected the following items which could be part of the scope:
- Java 11 upgrade (minimally)
- Hadoop 3.3 (needed to the Java 11 upgrade)
- Full Iceberg integration (Read, Write, Delete, Update, Merge)
- Clean up the HMS API interface (deprecate old methods which are already 
released, remove unreleased methods which have not been released yet, 
use/create methods with Request objects as parameters instead of Context 
objects)

We might want to collect information about the usage of specific modules, and 
might deprecate some based on the feedback (remove them from the release or at 
least mark them deprecated), so we can reduce the project complexity based on 
the info. Some features which pooped up:
- HCatalog
- WebHCat
- Pig integration
- ??

We would be interested on any feedback for this plan / scope / deprecation. 
Feel free to suggest any additions or removals from these lists, or even 
propose an entirely different plan.
Also if you would like to take over specific tasks, feel free to grab it, and 
start working on it or start discussing it.

Thanks,
Peter


[jira] [Created] (HIVE-26138) Fix mapjoin_memcheck

2022-04-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-26138:
---

 Summary: Fix mapjoin_memcheck
 Key: HIVE-26138
 URL: https://issues.apache.org/jira/browse/HIVE-26138
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


this test fails very frequently

http://ci.hive.apache.org/job/hive-precommit/job/master/1169/testReport/junit/org.apache.hadoop.hive.cli.split7/TestCliDriver/Testing___split_01___PostProcess___testCliDriver_mapjoin_memcheck_/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-26135) Invalid Anti join conversion may cause missing results

2022-04-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-26135:
---

 Summary: Invalid Anti join conversion may cause missing results
 Key: HIVE-26135
 URL: https://issues.apache.org/jira/browse/HIVE-26135
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


right now I think the following is needed to trigger the issue:
* left outer join
* only select left hand side columns
* conditional which is using some udf
* the nullness of the udf is checked

repro sql; in case the conversion happens the row with 'a' will be missing
{code}
drop table if exists t;
drop table if exists n;

create table t(a string) stored as orc;
create table n(a string) stored as orc;

insert into t values ('a'),('1'),('2'),(null);
insert into n values ('a'),('b'),('1'),('3'),(null);


explain select n.* from n left outer join t on (n.a=t.a) where assert_true(t.a 
is null) is null;
explain select n.* from n left outer join t on (n.a=t.a) where cast(t.a as 
float) is null;


select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
null;
set hive.auto.convert.anti.join=false;
select n.* from n left outer join t on (n.a=t.a) where cast(t.a as float) is 
null;

{code}



workaround could be to disable the feature:
{code}
set hive.auto.convert.anti.join=false;
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Start releasing the master branch

2022-03-02 Thread Zoltan Haindrich
der(FetchOperator.java:306)

  at

org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560)

  ... 7 more

On Tue, 1 Mar 2022, Alessandro Solimando wrote:


Hi Sungwoo,
last time I tried to run TPCDS-based benchmark I stumbled upon a similar
situation, finally I found that statistics were not computed, so CBO was
not kicking in, and the automatic retry goes with CBO off which was

failing

for something like 10 queries (subqueries cannot be decorrelated, but

also

some runtime errors).

Making sure that (column) statistics were correctly computed fixed the
problem.

Can you check if this is the case for you?

HTH,
Alessandro

On Tue, 1 Mar 2022 at 15:28, POSTECH CT  wrote:


Hello Hive team,

I wonder if anyone in the Hive team has tried the TPC-DS benchmark on
the master branch recently.  We occasionally run TPC-DS system tests
using the master branch, and the tests don't succeed completely. Here
is how our TPC-DS tests proceed.

1. Compile and run Hive on Tez (not Hive-LLAP)
2. Load ORC tables from 1TB TPC-DS raw text data, and compute

statistics

3. Run 99 TPC-DS queries which were slightly modified to return
varying number of rows (rather than 100 rows)
4. Compare the results against the previous results

The previous results were obtained and cross-checked by running Hive
3.1.2 and SparkSQL 2.3/3.2, so we are faily confident about their
correctness.

For the latest commit in the master branch, step 2 fails. For earlier
commits (for example, commits in February 2021), step 3 fails where
several queries either fail or return wrong results.

We can compile and report the test results in this mailing list, but
would like to know if similar results have been reproduced by the Hive
team, in order to make sure that we did not make errors in our tests.

If it is okay to open a JIRA ticket that only reports failures in the
TPC-DS test, we could also perform git bi-sect to locate the commit
that begin to generate wrong results.

--- Sungwoo Park

On Tue, 1 Mar 2022, Zoltan Haindrich wrote:


Hey,

Great to hear that we are on the same side regarding these things :)

For around a week now - we have nightly builds for the master branch:
http://ci.hive.apache.org/job/hive-nightly/12/

I think we have 1 blocker issue:
https://issues.apache.org/jira/browse/HIVE-25665

I know about one more thing I would rather get fixed before we release

it:

https://issues.apache.org/jira/browse/HIVE-25994
The best would be to introduce smoke tests (HIVE-22302) to ensure that
something like this will not happen in the future - but we should

probably

start moving forward.

I think we could call the first iteration of this as "4.0.0-alpha-1"

:)


I've added 4.0.0-alpha-1 as a version - and added the above two ticket

to it.





https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1


Are there any more things you guys know which would be needed?

cheers,
Zoltan


On 2/22/22 12:18 PM, Peter Vary wrote:

I would vote for 4.0.0-alpha-1 or similar for all of the components.

When we have more stable releases I would keep the 4.x.x schema,

since

everyone is familiar with it, and I do not see a really good reason

to

change it.

Thanks,
Peter



On 2022. Feb 10., at 3:34, Szehon Ho 

wrote:


+1 that would be awesome to see Hive master released after so long.

Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would

pick

any 3.x or calendar date (which could tend to slip and be more
confusing?).

Thanks in any case to get the ball rolling.
Szehon

On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich 

wrote:



Hey,

Thank you guys for chiming in; versioning is for sure something we

should

get to some common ground.
Its a triple problem right now; I think we have the following

things:

* storage-api
** we have "2.7.3-SNAPSHOT" in the repo
***




https://github.com/apache/hive/blob/0d1cc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27

** meanwhile we already have 2.8.1 released to maven central
***

https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api

* standalone-metastore
** 4.0.0-SNAPSHOT in the repo
** last release is 3.1.2
* hive
** 4.0.0-SNAPSHOT in the repo
** last release is 3.1.2

Regarding the actual version number I'm not entirely sure where we

should

start the numbering - that's why I was referring to it as Hive-X

in my

first letter.

I think the key point here would be to start shipping releases

regularily

and not the actual version number we will use - I'll kinda open to

any

versioning scheme which
reflects that this is a newer release than 3.1.2.

I could imagine the following ones:
(A) start with something less expected; but keep 3 in the prefix to
reflect that this is not yet 4.0
 I can imagine the following numbers:
 3.900.0, 3.901.0, ...
 3.9.0, 3.9.1, ...
(B) start 4.0.0
 4.0.0, 4.1.0, ...
(C) jump to some calendar based version 

Re: Start releasing the master branch

2022-03-01 Thread Zoltan Haindrich

Hey,

Great to hear that we are on the same side regarding these things :)

For around a week now - we have nightly builds for the master branch:
http://ci.hive.apache.org/job/hive-nightly/12/

I think we have 1 blocker issue:
https://issues.apache.org/jira/browse/HIVE-25665

I know about one more thing I would rather get fixed before we release it:
https://issues.apache.org/jira/browse/HIVE-25994
The best would be to introduce smoke tests (HIVE-22302) to ensure that 
something like this will not happen in the future - but we should probably 
start moving forward.

I think we could call the first iteration of this as "4.0.0-alpha-1" :)

I've added 4.0.0-alpha-1 as a version - and added the above two ticket to it.
https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-1

Are there any more things you guys know which would be needed?

cheers,
Zoltan


On 2/22/22 12:18 PM, Peter Vary wrote:

I would vote for 4.0.0-alpha-1 or similar for all of the components.

When we have more stable releases I would keep the 4.x.x schema, since everyone 
is familiar with it, and I do not see a really good reason to change it.

Thanks,
Peter



On 2022. Feb 10., at 3:34, Szehon Ho  wrote:

+1 that would be awesome to see Hive master released after so long.

Either 4.0 or 4.0.0-alpha-1 makes sense to me, not sure how we would pick
any 3.x or calendar date (which could tend to slip and be more confusing?).

Thanks in any case to get the ball rolling.
Szehon

On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich  wrote:


Hey,

Thank you guys for chiming in; versioning is for sure something we should
get to some common ground.
Its a triple problem right now; I think we have the following things:
* storage-api
** we have "2.7.3-SNAPSHOT" in the repo
***
https://github.com/apache/hive/blob/0d1cc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
** meanwhile we already have 2.8.1 released to maven central
*** https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
* standalone-metastore
** 4.0.0-SNAPSHOT in the repo
** last release is 3.1.2
* hive
** 4.0.0-SNAPSHOT in the repo
** last release is 3.1.2

Regarding the actual version number I'm not entirely sure where we should
start the numbering - that's why I was referring to it as Hive-X in my
first letter.

I think the key point here would be to start shipping releases regularily
and not the actual version number we will use - I'll kinda open to any
versioning scheme which
reflects that this is a newer release than 3.1.2.

I could imagine the following ones:
(A) start with something less expected; but keep 3 in the prefix to
reflect that this is not yet 4.0
 I can imagine the following numbers:
 3.900.0, 3.901.0, ...
 3.9.0, 3.9.1, ...
(B) start 4.0.0
 4.0.0, 4.1.0, ...
(C) jump to some calendar based version number like 2022.2.9
 trunk based development has pros and cons...making a move like this
irreversibly pledges trunk based development; and makes release branches
hard to introduce
(X) somewhat orthogonal is to (also) use some suffixes
 4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
 this is probably the most tempting to use - but this versioning
schema with a non-changing MINOR and PATCH number will
 also suggest that the actual software is fully compatible - and only
bugs are being fixed - which will not be true...

I really like the idea to suffix these releases with alpha or beta - which
will communicate our level commitment that these are not 100% production
ready artifacts.

I think we could fix HIVE-25665; and probably experiment with 4.0.0-alpha1
for start...


This also means there should *not* be a branch-4 after releasing Hive

4.0

and let that diverge (and becomes the next, super-ignored branch-3),

correct; no need to keep a branch we don't maintain...but in any case I
think we can postpone this decision until there will be something to
release... :)

cheers,
Zoltan



On 2/9/22 10:23 AM, László Bodor wrote:

Hi All!

A purely technical question: what will the SNAPSHOT version become after
releasing Hive 4.0.0? I think this is important, as it defines and

reflects

the future release plans.

Currently, it's 4.0.0-SNAPSHOT, I guess it's since Hive 3.0 + branch-3.
Hive is an evolving and super-active project: if we want to make regular
releases, we should simply release Hive 4.0 and bump pom to

4.1.0-SNAPSHOT,

which clearly says that we can release Hive 4.1 anytime we want, without
being frustrated about "whether we included enough cool stuff to release
5.0".

This also means there should *not* be a branch-4 after releasing Hive 4.0
and let that diverge (and becomes the next, super-ignored branch-3), only
when we end up bringing a minor backward-incompatible thing that needs a
4.0.x, and when it happens, we'll create *branch-4.0 *on demand. For me,

a

branch called *branch-4.0* doesn't imply ei

[jira] [Created] (HIVE-25994) Analyze table runs into ClassNotFoundException-s in case binary distribution is used

2022-03-01 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25994:
---

 Summary: Analyze table runs into ClassNotFoundException-s in case 
binary distribution is used
 Key: HIVE-25994
 URL: https://issues.apache.org/jira/browse/HIVE-25994
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


any nightly release can be used to reproduce this:

{code}
create table t (a integer); insert into t values (1) ; analyze table t compute 
statistics for columns;
{code}

results in
{code}
Caused by: java.lang.NoClassDefFoundError: org/antlr/runtime/tree/CommonTree
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.getDeclaredConstructor(Class.java:2178)
at 
org.apache.hive.com.esotericsoftware.reflectasm.ConstructorAccess.get(ConstructorAccess.java:65)
at 
org.apache.hive.com.esotericsoftware.kryo.util.DefaultInstantiatorStrategy.newInstantiatorOf(DefaultInstantiatorStrategy.java:60)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1119)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1128)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:153)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:118)
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:729)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:216)
at 
org.apache.hive.com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:125)
... 38 more
Caused by: java.lang.ClassNotFoundException: org.antlr.runtime.tree.CommonTree
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25977) Enhance Compaction Cleaner to skip when there is nothing to do #2

2022-02-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25977:
---

 Summary: Enhance Compaction Cleaner to skip when there is nothing 
to do #2
 Key: HIVE-25977
 URL: https://issues.apache.org/jira/browse/HIVE-25977
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


initially this was just an addendum to the original patch ; but got delayed and 
altered - so it should have its own ticket



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25976) Cleaner may remove files being accessed from a fetch-task-converted reader

2022-02-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25976:
---

 Summary: Cleaner may remove files being accessed from a 
fetch-task-converted reader
 Key: HIVE-25976
 URL: https://issues.apache.org/jira/browse/HIVE-25976
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


in a nutshell the following happens:
* query is compiled in fetch-task-converted mode
* no real execution happensbut the locks are released
* the HS2 is communicating with the client and uses the fetch-task to get the 
rows - which in this case will directly read files from the table's 
directory
* client sleeps between reads - so there is ample time for other events...
* cleaner wakes up and removes some files
* in the next read the fetch-task encounters a read error...



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Why the Hive CI aways unstable status by execute tests.

2022-02-13 Thread Zoltan Haindrich

Hey Fred!

Could you provide some links to the problematic runs?
Some measure of "quality" are the builds of the master branch [1]; which do 
show a rate of around 75% - so yeah we have plenty of room for improvements in this area.

AFAIK there is some derby related issue in some cases - which is pretty wierd.
Going thru the last couple builds:
I see that a simple test timeout happened in [2]
not sure if its the derby issue; but [3] is suspcicous
there was some 500 internal server error in [4],[5]; not sure about that - most likely the pod was disconnected during execution; I think this happens when GCP is upgrading 
kubernetes beneath us...

docker statup failed in [6]

reducing the above issues in any way could get us more stable master builds -> 
which will also mean more stable PR builds!

cheers,
Zoltan


[1] http://ci.hive.apache.org/job/hive-precommit/job/master/
[2] 
http://ci.hive.apache.org/job/hive-precommit/job/master/1070/testReport/junit/org.apache.hadoop.hive.ql.parse/TestParseDriver/Testing___split_18___PostProcess___testExoticSJSSubQuery/
[3] 
http://ci.hive.apache.org/job/hive-precommit/job/master/1068/testReport/org.apache.hive.streaming/TestStreamingDynamicPartitioning/Testing___split_05___PostProcess___testWriteBeforeBegin/

[4] 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/1073/pipeline
[5] 
http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/master/1067/pipeline
[6] 
http://ci.hive.apache.org/job/hive-precommit/job/master/1066/testReport/junit/org.apache.hadoop.hive.cli.split11/TestMiniLlapLocalCliDriver/Testing___split_07___PostProcess___testCliDriver_jdbc_table_with_schema_mssql_/


On 2/13/22 11:54 AM, Fred Bai wrote:

Hi every one:

Why the Hive CI aways unstable. I found the CI error stack not same every
times.

Some time 500 error, some time connection refused.

The CI result aways unstable, It doesn't seem to have anything to do with
my PR code.

Thanks.



Nightlies

2022-02-10 Thread Zoltan Haindrich

Hey,

I've built a preview of a nightly build; it could be tried out using the 
following:

git clone https://github.com/kgyrtkirk/hive-dev-box
cd hive-dev-box
./hdb run nightlytest
sw hive 
http://ci.hive.apache.org/job/hive-nightly/4/artifact/archive/apache-hive-4.0.0-nightly-dd23fa9147-20220210_160351-bin.tar.gz
reinit_metastore
hive_launch

the patch which fixes pom.xml issues/etc is not yet merged; but its here: 
https://github.com/apache/hive/pull/3013
job is at: http://ci.hive.apache.org/job/hive-nightly/

cheers,
Zoltan




Re: Time to Remove Hive-on-Spark

2022-02-10 Thread Zoltan Haindrich

Hey,

I think there is no real interest in this feature; we don't have users/contributors backing it - last development was around 2018 October; there were ~2 bugfix commits ever 
since that...we should stop carrying dead weight...another 2 weeks went by since Stamatis have reminded us that after 1.5 years(!) nothing have changed.


+1 on removing it

cheers,
Zoltan

you may inspect some of the recent changes with:
git log -c `find . -type f -path '**/spark/**'|grep -v xml|grep -v 
properties|grep -v q.out`


On 1/28/22 2:32 PM, Stamatis Zampetakis wrote:

Hi team,

Almost one year has passed since the last exchange in this discussion and
if I am not wrong there has been no effort to revive Hive-on-Spark. To be
more precise, I don't think I have seen any Spark related JIRA for quite
some time now and although I don't want to rush into conclusions, there
does not seem to be any community member involved in maintaining or adding
new features in this part of the code.

Keeping dead code in the repository does not do any good to the project and
puts a non-negligible burden to future maintainers.

Clearly, we cannot make a new Hive release where a major feature is
completely untested so either someone commits to re-enable/fix the
respective tests soon or we move forward the work started by David and drop
support for Hive-on-Spark.

I would like to ask the community if there is anyone who can take up this
maintenance task and enable/fix Spark related tests in the next month or so?

Best,
Stamatis

On Sat, Feb 27, 2021 at 4:17 AM Edward Capriolo 
wrote:


I do not know how it works for most of the world. But in cloudera where the
TEZ options were never popular hive-on-spark represents a solid way to get
things done for small datasets lower latency.

As for the spark adoption. You know a while ago I came up with some ways to
make hive more  spark like. One of them was a found a way to make "compile"
a hive keyword so folks could build UDFs on the fly. It was such an
uphil climb. Folks found a way to make it disabled by default for security.
Then later when things moved from CLI to beeline it was like the ONLY thing
that I found not ported. Like it was extremely frustrating.






On Mon, Jul 27, 2020 at 3:19 PM David  wrote:


Hello  Xuefu,

I am not part of the Cloudera Hive product team,  though I volunteer to
work on small projects from time to time.  Perhaps someone from that team
can chime in with some of their thoughts, but personally, I think that in
the long run, there will be more of a merge between Hive-on-Spark and

other

Spark-native offerings.  I'm not sure what the differentiation will be
going forward.  With that said, are there any developers on this mailing
list who are willing to take on the maintenance effort of keeping HoS
moving forward?

http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/



https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/config-sts.html



Thanks.

On Thu, Jul 23, 2020 at 12:35 PM Xuefu Zhang  wrote:


Previous reasoning seemed to suggest a lack of user adoption. Now we

are

concerned about ongoing maintenance effort. Both are valid

considerations.

However, I think we should have ways to find out the answers.

Therefore,

I

suggest the following be carried out:

1. Send out the proposal (removing Hive on Spark) to users including
u...@hive.apache.org and get their feedback.
2. Ask if any developers on this mailing list are willing to take on

the

maintenance effort.

I'm concerned about user impact because I can still see issues being
reported on HoS from time to time. I'm more concerned about the future

of

Hive if we narrow Hive neutrality on execution engines, which will

possibly

force more Hive users to migrate to other alternatives such as Spark

SQL,

which is already eroding Hive's user base.

Being open and neutral used to be Hive's most admired strengths.

Thanks,
Xuefu


On Wed, Jul 22, 2020 at 8:46 AM Alan Gates 

wrote:



An important point here is I don't believe David is proposing to

remove

Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing

to

support it in existing 2 and 3 lines makes sense, but since no one

has

maintained it on trunk for some time and it does not work with many

of

the

newer features it should be removed from trunk.

Alan.

On Tue, Jul 21, 2020 at 4:10 PM Chao Sun  wrote:


Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a

very

large scale in production right now and I don't think we have any

plan

to

change it soon.



On Tue, Jul 21, 2020 at 11:28 AM David  wrote:


Hello,

Thanks for the feedback.

Just a quick recap: I did propose this @dev and I received

unanimous

+1's

from the community.  After a couple months, I created the PR.

Certainly open to discussion, but there hasn't been any

discussion

thus

far

because there have been no objections until this point.

HoS has low adoption, heavy technical debt, and the manner i

[jira] [Created] (HIVE-25944) Format pom.xml-s

2022-02-09 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25944:
---

 Summary: Format pom.xml-s
 Key: HIVE-25944
 URL: https://issues.apache.org/jira/browse/HIVE-25944
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


at the moment I touch pom.xml-s with xmlstarlet it starts fixing indentation 
which makes seeing real diffs harder.

fix and enforce that the pom.xmls are indented correctly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Start releasing the master branch

2022-02-09 Thread Zoltan Haindrich
kely
discover.

The only real blocker that we may want to treat is HIVE-25665 [1] but we
can continue the discussion under that ticket and re-evaluate if

necessary,


Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25665


On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich  wrote:


Hey All,

We didn't made a release for a long time now; (3.1.2 was released on 26
August 2019) - and I think because we didn't made that many branch-3
releases; not too many fixes
were ported there - which made that release branch kinda erode away.

We have a lot of new features/changes in the current master.
I think instead of aiming for big feature-packed releases we should aim
for making a regular release every few months - we should make regular
releases which people could
install and use.
After all releasing Hive after more than 2 years would be big step

forward

in itself alone - we have so many improvements that I can't even

count...


But I may know not every aspects of the project / states of some

internal

features - so I would like to ask you:
What would be the bare minimum requirements before we could release the
current master as Hive X?

There are many nice-to-have-s like:
* hadoop upgrade
* jdk11
* remove HoS or MR
* ?
but I don't think these are blockers...we can make any of these in the
next release if we start making them...

cheers,
Zoltan









Start releasing the master branch

2022-02-01 Thread Zoltan Haindrich

Hey All,

We didn't made a release for a long time now; (3.1.2 was released on 26 August 2019) - and I think because we didn't made that many branch-3 releases; not too many fixes 
were ported there - which made that release branch kinda erode away.


We have a lot of new features/changes in the current master.
I think instead of aiming for big feature-packed releases we should aim for making a regular release every few months - we should make regular releases which people could 
install and use.

After all releasing Hive after more than 2 years would be big step forward in 
itself alone - we have so many improvements that I can't even count...

But I may know not every aspects of the project / states of some internal 
features - so I would like to ask you:
What would be the bare minimum requirements before we could release the current 
master as Hive X?

There are many nice-to-have-s like:
* hadoop upgrade
* jdk11
* remove HoS or MR
* ?
but I don't think these are blockers...we can make any of these in the next 
release if we start making them...

cheers,
Zoltan


[jira] [Created] (HIVE-25883) Enhance Compaction Cleaner to skip when there is nothing to do

2022-01-20 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25883:
---

 Summary: Enhance Compaction Cleaner to skip when there is nothing 
to do
 Key: HIVE-25883
 URL: https://issues.apache.org/jira/browse/HIVE-25883
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


the cleaner works the following way:
* it identifies obsolete directories (delta dirs ; which doesn't have open txns)
* removes them and done

if there are no obsolete directoris that is attributed to that there might be 
open txns so the request should be retried later.

however if for some reason the directory was already cleaned - similarily it 
has no obsolete directories; and thus the request is retried for forever 




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[ANNOUNCE] New committer: Zhihua Deng

2022-01-19 Thread Zoltan Haindrich

Hey all,

Apache Hive's Project Management Committee (PMC) has invited Zhihua Deng
to become a committer, and we are pleased to announce that he has accepted!

Zhihua welcome, thank you for your contributions, and we look forward your
further interactions with the community!

Zoltan Haindrich (on behalf of the Apache Hive PMC)


[jira] [Created] (HIVE-25874) Slow filter evaluation of nest struct fields in vectorized executions

2022-01-18 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25874:
---

 Summary: Slow filter evaluation of nest struct fields in 
vectorized executions
 Key: HIVE-25874
 URL: https://issues.apache.org/jira/browse/HIVE-25874
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


{code:java}

create table t as
select
named_struct('id',13,'str','string','nest',named_struct('id',12,'str','string','arr',array('value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value','value')))
s;

-- go up to 1M rows
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
insert into table t select * from t union all select * from t union all select 
* from t union all select * from t union all select * from t union all select * 
from t union all select * from t union all select * from t union all select * 
from t;
-- insert into table t select * from t union all select * from t union all 
select * from t union all select * from t union all select * from t union all 
select * from t union all select * from t union all select * from t union all 
select * from t;


set hive.fetch.task.conversion=none;

select count(1) from t;
--explain
select s
.id from t
where 
s
.nest
.id  > 0;

 {code}


interestingly; the issue is not present:
* for a query not looking into the nested struct
* and in case the struct with the array is at the top level

{code}
select count(1) from t;
--explain
select s
.id from t
where 
s
-- .nest
.id  > 0;
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25844) Exception deserialization error-s may cause beeline to terminate immediately

2022-01-04 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25844:
---

 Summary: Exception deserialization error-s may cause beeline to 
terminate immediately
 Key: HIVE-25844
 URL: https://issues.apache.org/jira/browse/HIVE-25844
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 3.1.2
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


the exception on the server side happens:
 * fetch task conversion is on
 * there is an exception during reading the table the error bubbles up
 * => transmits a message to beeline that error class name is: 
"org.apache.phoenix.schema.ColumnNotFoundException" + the message
 * it tries to reconstruct the exception around HiveSqlException
 * but during the constructor call 
org.apache.phoenix.exception.SQLExceptionCode is needed which fails to load 
org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
 * a
java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/shaded/com/google/protobuf/Service is thrown - which is 
not handled in that method - so it becomes a real error ; and shuts down the 
client

{code:java}
java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/shaded/com/google/protobuf/Service
[...]
at java.lang.Class.forName(Class.java:264)
at 
org.apache.hive.service.cli.HiveSQLException.newInstance(HiveSQLException.java:245)
at 
org.apache.hive.service.cli.HiveSQLException.toStackTrace(HiveSQLException.java:211)
[...]
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.shaded.com.google.protobuf.Service
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25823) Incorrect false positive results for outer join using non-satisfiable residual filters

2021-12-20 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25823:
---

 Summary: Incorrect false positive results for outer join using 
non-satisfiable residual filters
 Key: HIVE-25823
 URL: https://issues.apache.org/jira/browse/HIVE-25823
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


similar to HIVE-25822 
{code}
create table t_y (id integer,s string);
create table t_xy (id integer,s string);

insert into t_y values(0,'a'),(1,'y'),(1,'x');
insert into t_xy values(1,'x'),(1,'y');
select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y' and 
l.id+2*r.id=1);
{code}

the rows with full of NULLs are incorrect
{code}
+---+---+---+---+
| l.id  |  l.s  | r.id  |  r.s  |
+---+---+---+---+
| NULL  | NULL  | 0 | a |
| NULL  | NULL  | NULL  | NULL  |
| 1 | y | NULL  | NULL  |
| NULL  | NULL  | NULL  | NULL  |
| NULL  | NULL  | 1 | y |
| NULL  | NULL  | 1 | x |
+---+---+---+---+
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25822) Unexpected result rows in case of outer join contains conditions only affecting one side

2021-12-18 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25822:
---

 Summary: Unexpected result rows in case of outer join contains 
conditions only affecting one side
 Key: HIVE-25822
 URL: https://issues.apache.org/jira/browse/HIVE-25822
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


needed
* outer join
* on condition has at least one condition for one side of the join
* in a single reducer:
** a right hand side only row outputted right before
** >=2 rows on LHS and 1 on RHS matching in the join keys but the first LHS 
doesn't satisfies the filter condition
** second LHS row with good filter condition

{code}
with
t_y as (select col1 as id,col2 as s from (VALUES(0,'a'),(1,'y')) as c),
t_xy as (select col1 as id,col2 as s from (VALUES(1,'x'),(1,'y')) as c) 
select * from t_xy l full outer join t_y r on (l.id=r.id and l.s='y');
{code}

null,null,1,y is an unexpected result
{code}
+---+---+---+---+
| l.id  |  l.s  | r.id  |  r.s  |
+---+---+---+---+
| NULL  | NULL  | 0 | a |
| 1 | x | NULL  | NULL  |
| NULL  | NULL  | 1 | y |
| 1 | y | 1 | y |
+---+---+---+---+
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25820) Provide a way to disable join filters

2021-12-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25820:
---

 Summary: Provide a way to disable join filters
 Key: HIVE-25820
 URL: https://issues.apache.org/jira/browse/HIVE-25820
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Recent log4j vulnerabilities

2021-12-15 Thread Zoltan Haindrich

Hello all!

In the recent week there were 2 new log4j vulnerabilities discovered (CVE-2021-45046, CVE-2021-44228) - and since we use log4j in Hive; existing installations might be 
affected as well.


Doing a new Hive release on any existing line would probably need a longer timeframe - and doing an upgrade would probably cause further problems for existing installation; 
for now I'll try to give some help to help in patching existing clusters.


My understanding is that both CVE can be fixed by following one of these 
options:
* remove the JndiLookup.class from the affected jars
* replace the jar with the 2.16.0 version

To identify the affected jars; you could run this script - which will ignore 
2.16.0 if there is any:

pat=org/apache/logging/log4j/core/lookup/JndiLookup.class mc=org/apache/logging/log4j/core/pattern/MessagePatternConverter.class && find . -name '*.jar'|xargs -n1 -IJAR 
unzip -t JAR |fgrep -f <(echo "$pat";echo 'Archive:')|grep -B1 "$pat"|grep '^Archive:'|cut -d '/' -f2-|xargs -n1 -IJAR bash -c 'unzip -p JAR $mc|md5sum|paste - <(echo 
JAR)'|fgrep -vf <(echo 374fa1c796465d8f542bb85243240555 )


You could remove the JndiLookup.class from the identified jars with something 
similar to this:
zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class

To validate if you are still affected or not:
* generate a token on https://canarytokens.org/
* try with queries like (replace your token):
set hive.fetch.task.conversion=none;
create table aa (a string) location 
'file:///dfs${jndi:ldap:canarytokens.com/a}';
select '${jndi:ldap://canarytokens.com/a}';

cheers,
Zoltan


[jira] [Created] (HIVE-25792) Multi Insert query fails on CBO path

2021-12-09 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25792:
---

 Summary: Multi Insert query fails on CBO path 
 Key: HIVE-25792
 URL: https://issues.apache.org/jira/browse/HIVE-25792
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


{code}
set hive.cbo.enable=true;

drop table if exists aa1;
drop table if exists bb1;
drop table if exists cc1;
drop table if exists dd1;
drop table if exists ee1;
drop table if exists ff1;

create table aa1 ( stf_id string);
create table bb1 ( stf_id string);
create table cc1 ( stf_id string);
create table ff1 ( x string);

explain
from ff1 as a join cc1 as b 
insert overwrite table aa1 select   stf_id GROUP BY b.stf_id
insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id
;

{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25791) Improve SFS exception messages

2021-12-09 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25791:
---

 Summary: Improve SFS exception messages
 Key: HIVE-25791
 URL: https://issues.apache.org/jira/browse/HIVE-25791
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


Especially for cases when the path is already known to be invalid; like: 
`sfs+file:///nonexistent/nonexistent.txt/#SINGLEFILE#`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25780) DistinctExpansion creates more than 64 grouping sets II

2021-12-06 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25780:
---

 Summary: DistinctExpansion creates more than 64 grouping sets II
 Key: HIVE-25780
 URL: https://issues.apache.org/jira/browse/HIVE-25780
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


HIVE-25498 have fixed this when there are only count(distinct x) queries.

however after the rewrite happens grouping sets are used to handle group by 
columns as well



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25770) AST is corrupted after CBO fallback for CTAS queries

2021-12-03 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25770:
---

 Summary: AST is corrupted after CBO fallback for CTAS queries
 Key: HIVE-25770
 URL: https://issues.apache.org/jira/browse/HIVE-25770
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
 Attachments: repro.q

reproduce:
* revert ec44c6081c88b81245185fa6a552d8c3631e47fa to force cbo fallbacks for 
>64 grouping sets
* use repro.q test

* the query would run with cbo turned off
* but with cbo enabled it would fail in conservative mode as well



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25752) Fix incremental compilation of parser module

2021-11-30 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25752:
---

 Summary: Fix incremental compilation of parser module
 Key: HIVE-25752
 URL: https://issues.apache.org/jira/browse/HIVE-25752
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


this issue doesn't happen all the time - but when it does its really annoying

the problem is that the antlr files are not regenerated; however the 
"HiveParser.java Fix" is run regardless...which corrupts the java files after a 
second run and causes compilation errors
{code}
[INFO] --- antlr3-maven-plugin:3.5.2:antlr (default) @ hive-parser ---
[INFO] ANTLR: Processing source directory /home/dev/hive/parser/src/java
ANTLR Parser Generator  Version 3.5.2
Grammar 
/home/dev/hive/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g is 
up to date - build skipped
Grammar 
/home/dev/hive/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g is 
up to date - build skipped
Grammar 
/home/dev/hive/parser/src/java/org/apache/hadoop/hive/ql/parse/HiveLexerStandard.g
 is up to date - build skipped
Grammar 
/home/dev/hive/parser/src/java/org/apache/hadoop/hive/ql/parse/HintParser.g is 
up to date - build skipped
[INFO] 
[INFO] --- exec-maven-plugin:3.0.0:exec (HiveParser.java fix) @ hive-parser ---
[INFO] 
{code}

erros like:
{code}
[ERROR] 
/home/dev/hive/parser/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java:[50,16]
 class, interface, or enum expected
{code}

but I've also seen
{code}
[ERROR] 
/home/dev/hive/parser/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParser.java:[49,32]
 cannot find symbol
[ERROR]   symbol:   class statement_return
[ERROR]   location: class org.apache.hadoop.hive.ql.parse.HiveParser
[ERROR] 
/home/dev/hive/parser/target/generated-sources/antlr3/org/apache/hadoop/hive/ql/parse/HiveParserTokens.java:[13,19]
 cannot find symbol
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25748) Investigate Union comparision

2021-11-29 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25748:
---

 Summary: Investigate Union comparision
 Key: HIVE-25748
 URL: https://issues.apache.org/jira/browse/HIVE-25748
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


both of the following cases change the "non-used" part of the union (note: 
`create_union(idx,o0,o1)` creates a union which uses the `idx`-th object)

{code}
SELECT (NULLIF(create_union(0,1,2),create_union(0,1,3)) is not null);
false
SELECT (NULLIF(create_union(0,1,2),create_union(1,2,1)) is not null);
true
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25738) NullIf doesn't support complex types

2021-11-24 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25738:
---

 Summary: NullIf doesn't support complex types
 Key: HIVE-25738
 URL: https://issues.apache.org/jira/browse/HIVE-25738
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


{code}
SELECT NULLIF(array(1,2,3),array(1,2,3))
{code}

results in:
{code}
 java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector 
cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDFNullif.evaluate(GenericUDFNullif.java:96)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:177)
at 
org.apache.hadoop.hive.ql.parse.type.HiveFunctionHelper.getReturnType(HiveFunctionHelper.java:135)
at 
org.apache.hadoop.hive.ql.parse.type.RexNodeExprFactory.createFuncCallExpr(RexNodeExprFactory.java:647)
[...]
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25735) Improve statestimator in UDFWhen/UDFCase

2021-11-24 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25735:
---

 Summary: Improve statestimator in UDFWhen/UDFCase
 Key: HIVE-25735
 URL: https://issues.apache.org/jira/browse/HIVE-25735
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25732) Improve HLL insert performance

2021-11-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25732:
---

 Summary: Improve HLL insert performance
 Key: HIVE-25732
 URL: https://issues.apache.org/jira/browse/HIVE-25732
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


HIVE-23095 have fixed a correctness issue and removed a temporary list which 
supposed to speed up the algorithm and thus it suffered some performance 
degradation.

There are ways to put back some of that stuff; or consider other options to 
gain back the lost performance - now that the bug is fixed it should be a 
performance only improvement ticket.

It would be interesting to know how much time we spend on updating this DS 
during a large insert to know the weight of such an improvement.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25725) Upgrade used docker-in-docker container version

2021-11-19 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25725:
---

 Summary: Upgrade used docker-in-docker container version
 Key: HIVE-25725
 URL: https://issues.apache.org/jira/browse/HIVE-25725
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


in HIVE-25714 I came to the conclusion that there might be something wrong with 
dind - upgrading it would be the first step.. and while doing so the storage 
driver should be checked if its appropriate/etc



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: hive-exec vs. hive-exec:core

2021-11-17 Thread Zoltan Haindrich




On 11/17/21 7:46 PM, Chao Sun wrote:

We have a working hive-exec jar


I'm not sure about this. The issue comes when the fat hive-exec jar shades
some jars but doesn't relocate them. In this case there is no way for the
downstream projects to resolve the conflict.


Exactly - I think those should be hammered out for good; fix the 
shading/relocation!



On the Spark side IIUC we had issues with Apache Commons as well as ORC
(see HIVE-25317 for an effort on this), and there could be more. Spark is
using Hive 2.3 though but the same applies for master/4.0 if dependency
versions differ between Hive and the downstream projects.


This change is only about master - it won't change Hive 2.3. HIVE-25317 was for 
branch-2 as well.
I've seen a few places wierd stuff because they were not able to use the 
hive-exec jar as-is.
Folks in the Impala project for example went in a direction to re-shade/re-filter the hive-exec jar and relocate some stuff in it - most likely because it conflicted with 
their stuff.

https://github.com/apache/impala/blob/master/java/shaded-deps/hive-exec/pom.xml
Taking a quick look at https://github.com/apache/spark/pull/33989/files it seems like you've also done something similarbut instead of using the base artifact; you have 
created a new shader.

I don't think this better than having an artifact which is simply works 
out-of-the-box.


cheers,
Zoltan



On Wed, Nov 17, 2021 at 10:35 AM Zoltan Haindrich  wrote:


On 11/17/21 7:07 PM, Daniel Fritsi wrote:

For Oozie we've decided to use fat Jar downstream (Cloudera) as there we

have processes to ensure 3rd-party library versions are kept in sync.


Since we don't have such a process in Apache, there we'll continue to

use the core Jar.

It might be possible to evade some problems by using a 3rd party lib
syncer - but if we've done a good job shading this stuff; it should not
cause any trouble even in case
other 3rd party stuff is presentbut in any case to check things out
you will need a Hive release in some form

cheers,
Zoltan



Dan

On 2021. 11. 17. 18:50, Chao Sun wrote:

the idea is to fix the issues they bump into - because people who load

the jdbc driver may also see those issues.

I don’t get what you mean here, could you elaborate a bit more?

IMO it's a bit premature to do this without a working hive-exec jar for
downstream projects like Spark/Trino/Presto. At the current state there

is

no way to upgrade these projects to use the fat hive-exec jar.



On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich  wrote:


Hey all,

I wanted to get back to this - but had other things going on.

Chao> it is still being used today by some other popular projects
the idea is to fix the issues they bump into - because people who load

the

jdbc driver may also see those issues.

Edward> [...] You all must like enjoy shading jars.
I totally agree that they may use a shell action as well.
I wonder how do you propose to solve issues related to clients using a
different version of the guava library?

The changes which will remove the core artifact stuff is ready:
https://github.com/apache/hive/pull/2648

cheers,
Zoltan

On 9/21/21 8:23 PM, Edward Capriolo wrote:

recommendation from the Hive team is to use the hive-exec.jar

artifact.


You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept

bringing up

concurrency bugs in hive-server-1 my prs were rejected (even though

hive

server2 would not have these bugs). I still cannot fathom why someone

using

oozie would want a fat jar of hive (as opposed to hive server or

hivejdbc)

. If I had to do that, i would just use shell action. You all must

like

enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun  wrote:


I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular

projects

including Spark and Trino/Presto. By sticking to `hive-exec-core` it

gives

more flexibility to the other projects to shade & relocate those

classes

according to their need, without waiting for new Hive releases. Hive

also

needs to make sure it relocate everything properly. Otherwise, if

some

classes are shaded & included in `hive-exec` but not relocated, there

is no

way for the other projects to exclude them and avoid potential

conflicts.

Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich

wrote:



Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:

Indeed this may lead to binary incompatibility problems as the one

you

mentioned. If I understood correctly the problem you cite comes up

if

library B in this case is not relocated. If Hive systematically

relocates

shaded deps do you think there will still be binary incompatibility

issues?

If the relocating solution works, I would personally prefer going

down

this

path instead of introducing 

Re: hive-exec vs. hive-exec:core

2021-11-17 Thread Zoltan Haindrich

On 11/17/21 7:07 PM, Daniel Fritsi wrote:

For Oozie we've decided to use fat Jar downstream (Cloudera) as there we have 
processes to ensure 3rd-party library versions are kept in sync.

Since we don't have such a process in Apache, there we'll continue to use the 
core Jar.


It might be possible to evade some problems by using a 3rd party lib syncer - but if we've done a good job shading this stuff; it should not cause any trouble even in case 
other 3rd party stuff is presentbut in any case to check things out you will need a Hive release in some form


cheers,
Zoltan



Dan

On 2021. 11. 17. 18:50, Chao Sun wrote:

the idea is to fix the issues they bump into - because people who load

the jdbc driver may also see those issues.

I don’t get what you mean here, could you elaborate a bit more?

IMO it's a bit premature to do this without a working hive-exec jar for
downstream projects like Spark/Trino/Presto. At the current state there is
no way to upgrade these projects to use the fat hive-exec jar.



On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich  wrote:


Hey all,

I wanted to get back to this - but had other things going on.

Chao> it is still being used today by some other popular projects
the idea is to fix the issues they bump into - because people who load the
jdbc driver may also see those issues.

Edward> [...] You all must like enjoy shading jars.
I totally agree that they may use a shell action as well.
I wonder how do you propose to solve issues related to clients using a
different version of the guava library?

The changes which will remove the core artifact stuff is ready:
https://github.com/apache/hive/pull/2648

cheers,
Zoltan

On 9/21/21 8:23 PM, Edward Capriolo wrote:

recommendation from the Hive team is to use the hive-exec.jar artifact.

You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept bringing up
concurrency bugs in hive-server-1 my prs were rejected (even though hive
server2 would not have these bugs). I still cannot fathom why someone

using

oozie would want a fat jar of hive (as opposed to hive server or

hivejdbc)

. If I had to do that, i would just use shell action. You all must

like

enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun  wrote:


I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular projects
including Spark and Trino/Presto. By sticking to `hive-exec-core` it

gives

more flexibility to the other projects to shade & relocate those classes
according to their need, without waiting for new Hive releases. Hive

also

needs to make sure it relocate everything properly. Otherwise, if some
classes are shaded & included in `hive-exec` but not relocated, there

is no

way for the other projects to exclude them and avoid potential

conflicts.

Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich  wrote:


Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:

Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically

relocates

shaded deps do you think there will still be binary incompatibility

issues?

If the relocating solution works, I would personally prefer going down

this

path instead of introducing an entirely new module just for the sake

of

dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies

and

people are more or less accustomed with this route.

I totally agree with you Stamatis - with the addition that we should

work

together with the owners of other projects to help them use the correct
artifact to gain access to
Hive's internal parts.
I've opened HIVE-25531 to remove the core classified artifact - and

ensure

that we will be uncovering and fixing future issues with the hive-exec
artifact.

cheers,
Zoltan



Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi



wrote:


Dear Hive developers,

I am Dan from the Oozie team and I would like to bring up the
hive-exec.jar vs. hive-exec-core.jar topic.
The reason for that is because as far as we understand the official
recommendation from the Hive team is to use the hive-exec.jar

artifact.

However in Oozie that can end-up in a binary incompatibility.

The reason for that is:

 * Let's say library A is included in the fat Jar.

 * And library B which is using library A is also included in the

fat

Jar.

 * Let's also say that library A's com.library.alib package is
   relocated to org.apache.hive.com.library.alib,
   meaning the com.library.alib.SomeClass becomes
   org.apache.hive.com.library.alib.SomeClass

 * So if B has a method like public void
   someMethod(com.libra

Re: hive-exec vs. hive-exec:core

2021-11-17 Thread Zoltan Haindrich




On 11/17/21 6:50 PM, Chao Sun wrote:

the idea is to fix the issues they bump into - because people who load

the jdbc driver may also see those issues.

I don’t get what you mean here, could you elaborate a bit more?


I suggest to work with the downstream projects people and smash out issues - if 
there is any.
I'll be here and open to help with that.


IMO it's a bit premature to do this without a working hive-exec jar for
downstream projects like Spark/Trino/Presto. At the current state there is
no way to upgrade these projects to use the fat hive-exec jar.


We have a working hive-exec jar - most of the problems was caused by:
* the incorrectly shaded guava lib we had in hive-exec with invalid relocation 
instructions
* similarily incorrectly shaded jackson 1.x
these issues are fixed on master - but since it was never released downstream 
projects have not yet been able to migrate to it.

I don't think we should keep something which could easily cause problems during 
usage - so we should remove the core artifact for good.

cheers,
Zoltan





On Wed, Nov 17, 2021 at 5:47 AM Zoltan Haindrich  wrote:


Hey all,

I wanted to get back to this - but had other things going on.

Chao> it is still being used today by some other popular projects
the idea is to fix the issues they bump into - because people who load the
jdbc driver may also see those issues.

Edward> [...] You all must like enjoy shading jars.
I totally agree that they may use a shell action as well.
I wonder how do you propose to solve issues related to clients using a
different version of the guava library?

The changes which will remove the core artifact stuff is ready:
https://github.com/apache/hive/pull/2648

cheers,
Zoltan

On 9/21/21 8:23 PM, Edward Capriolo wrote:

recommendation from the Hive team is to use the hive-exec.jar artifact.

You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept bringing up
concurrency bugs in hive-server-1 my prs were rejected (even though hive
server2 would not have these bugs). I still cannot fathom why someone

using

oozie would want a fat jar of hive (as opposed to hive server or

hivejdbc)

. If I had to do that, i would just use shell action. You all must

like

enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun  wrote:


I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular projects
including Spark and Trino/Presto. By sticking to `hive-exec-core` it

gives

more flexibility to the other projects to shade & relocate those classes
according to their need, without waiting for new Hive releases. Hive

also

needs to make sure it relocate everything properly. Otherwise, if some
classes are shaded & included in `hive-exec` but not relocated, there

is no

way for the other projects to exclude them and avoid potential

conflicts.


Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich  wrote:


Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:

Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically

relocates

shaded deps do you think there will still be binary incompatibility

issues?


If the relocating solution works, I would personally prefer going down

this

path instead of introducing an entirely new module just for the sake

of

dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies

and

people are more or less accustomed with this route.


I totally agree with you Stamatis - with the addition that we should

work

together with the owners of other projects to help them use the correct
artifact to gain access to
Hive's internal parts.
I've opened HIVE-25531 to remove the core classified artifact - and

ensure

that we will be uncovering and fixing future issues with the hive-exec
artifact.

cheers,
Zoltan




Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi



wrote:


Dear Hive developers,

I am Dan from the Oozie team and I would like to bring up the
hive-exec.jar vs. hive-exec-core.jar topic.
The reason for that is because as far as we understand the official
recommendation from the Hive team is to use the hive-exec.jar

artifact.


However in Oozie that can end-up in a binary incompatibility.

The reason for that is:

 * Let's say library A is included in the fat Jar.

 * And library B which is using library A is also included in the

fat

Jar.


 * Let's also say that library A's com.library.alib package is
   relocated to org.apache.hive.com.library.alib,
   meaning the com.library.alib.SomeClass becomes
   org.apache.hive.com.library.alib.SomeClass

 * So if B has a method like pub

[jira] [Created] (HIVE-25720) Fix flaky test TestScheduledReplicationScenarios

2021-11-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25720:
---

 Summary: Fix flaky test TestScheduledReplicationScenarios
 Key: HIVE-25720
 URL: https://issues.apache.org/jira/browse/HIVE-25720
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


failed at the first attempt; the issue happened during
{code}
drop scheduled query repl_load_p2
{code}
which is in a finally block ; so this exception may be shadowing another 
exception

http://ci.hive.apache.org/job/hive-flaky-check/463/





--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25719) Fix flaky test TestMiniLlapLocalCliDri​ver#testCliDriver[replication_​metrics_ingest]

2021-11-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25719:
---

 Summary: Fix flaky test 
TestMiniLlapLocalCliDri​ver#testCliDriver[replication_​metrics_ingest]
 Key: HIVE-25719
 URL: https://issues.apache.org/jira/browse/HIVE-25719
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


flaky checker failed after 3 attempts with a q.out difference

there seems to be some ID difference - maybe 2 events happened in a different 
order?

http://ci.hive.apache.org/job/hive-flaky-check/465/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_replication_metrics_ingest_/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: hive-exec vs. hive-exec:core

2021-11-17 Thread Zoltan Haindrich

Hey all,

I wanted to get back to this - but had other things going on.

Chao> it is still being used today by some other popular projects
the idea is to fix the issues they bump into - because people who load the jdbc 
driver may also see those issues.

Edward> [...] You all must like enjoy shading jars.
I totally agree that they may use a shell action as well.
I wonder how do you propose to solve issues related to clients using a 
different version of the guava library?

The changes which will remove the core artifact stuff is ready: 
https://github.com/apache/hive/pull/2648

cheers,
Zoltan

On 9/21/21 8:23 PM, Edward Capriolo wrote:

recommendation from the Hive team is to use the hive-exec.jar artifact.

You know about 10 years ago. I mentioned that oozie should just use
hive-service or hive jdbc. After a big fight where folks kept bringing up
concurrency bugs in hive-server-1 my prs were rejected (even though hive
server2 would not have these bugs). I still cannot fathom why someone using
oozie would want a fat jar of hive (as opposed to hive server or hivejdbc)
. If I had to do that, i would just use shell action. You all must like
enjoy shading jars.

Edward

On Thu, Sep 16, 2021 at 2:30 PM Chao Sun  wrote:


I'm not sure whether it is a good idea to remove `hive-exec-core`
completely - it is still being used today by some other popular projects
including Spark and Trino/Presto. By sticking to `hive-exec-core` it gives
more flexibility to the other projects to shade & relocate those classes
according to their need, without waiting for new Hive releases. Hive also
needs to make sure it relocate everything properly. Otherwise, if some
classes are shaded & included in `hive-exec` but not relocated, there is no
way for the other projects to exclude them and avoid potential conflicts.

Chao

On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich  wrote:


Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:

Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically

relocates

shaded deps do you think there will still be binary incompatibility

issues?


If the relocating solution works, I would personally prefer going down

this

path instead of introducing an entirely new module just for the sake of
dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies

and

people are more or less accustomed with this route.


I totally agree with you Stamatis - with the addition that we should work
together with the owners of other projects to help them use the correct
artifact to gain access to
Hive's internal parts.
I've opened HIVE-25531 to remove the core classified artifact - and

ensure

that we will be uncovering and fixing future issues with the hive-exec
artifact.

cheers,
Zoltan




Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi



wrote:


Dear Hive developers,

I am Dan from the Oozie team and I would like to bring up the
hive-exec.jar vs. hive-exec-core.jar topic.
The reason for that is because as far as we understand the official
recommendation from the Hive team is to use the hive-exec.jar

artifact.


However in Oozie that can end-up in a binary incompatibility.

The reason for that is:

* Let's say library A is included in the fat Jar.

* And library B which is using library A is also included in the

fat

Jar.


* Let's also say that library A's com.library.alib package is
  relocated to org.apache.hive.com.library.alib,
  meaning the com.library.alib.SomeClass becomes
  org.apache.hive.com.library.alib.SomeClass

* So if B has a method like public void
  someMethod(com.library.alib.SomeClass) then the signature of this
  method will be changed to:
  public void

someMethod(org.apache.hive.com.library.alib.SomeClass)


* If Oozie is also using B directly meaning we'll have b.jar on our
  classpath, but with the unchanged signature,
  so when hive-exec tries to invoke someMethod then depending on
  whether b.jar coming from us will be loaded first or hive-exec

will,

  we can end-up with a NoSuchMethodError is hive-exec tries to pass

an

  org.apache.hive.com.library.alib.SomeClass instance to the
  someMethod which was loaded from the original b.jar.

Hence in Oozie a long time ago (OOZIE-2621
<https://issues.apache.org/jira/browse/OOZIE-2621>) the decision was
made to use the hive-exec-core Jar.

Now since the shading process actually removes those dependencies from
the hive-exec pom which are included in the fat Jar, we manually had

to

add some dependencies to Oozie to compensate this.
However these dependencies are not used by Oozie directly and with the
growing features of hive-exec we had to repeat the same process
over-and-ov

[jira] [Created] (HIVE-25715) Provide nightly builds

2021-11-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25715:
---

 Summary: Provide nightly builds
 Key: HIVE-25715
 URL: https://issues.apache.org/jira/browse/HIVE-25715
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


provide nightly builds for the master branch



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25714) Some tests are flaky because docker is not able to start in 5 seconds

2021-11-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25714:
---

 Summary: Some tests are flaky because docker is not able to start 
in 5 seconds
 Key: HIVE-25714
 URL: https://issues.apache.org/jira/browse/HIVE-25714
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


there are some testruns failing with; and on the test site multiple pods are 
running in parallel - its not an ideal environment for tight deadlines
{code}
Unexpected exception java.lang.RuntimeException: Process docker failed to run 
in 5 seconds
 at 
org.apache.hadoop.hive.ql.externalDB.AbstractExternalDB.runCmd(AbstractExternalDB.java:92)
 at 
org.apache.hadoop.hive.ql.externalDB.AbstractExternalDB.launchDockerContainer(AbstractExternalDB.java:123)
 at 
org.apache.hadoop.hive.ql.qoption.QTestDatabaseHandler.beforeTest(QTestDatabaseHandler.java:111)
 at 
org.apache.hadoop.hive.ql.qoption.QTestOptionDispatcher.beforeTest(QTestOptionDispatcher.java:79)
{code}

http://ci.hive.apache.org/job/hive-precommit/job/PR-1674/4/testReport/junit/org.apache.hadoop.hive.cli.split19/TestMiniLlapLocalCliDriver/Testing___split_14___PostProcess___testCliDriver_qt_database_all_/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25713) Fix test TestLlapTaskSchedulerService#testPreemption

2021-11-17 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25713:
---

 Summary: Fix test TestLlapTaskSchedulerService#testPreemption
 Key: HIVE-25713
 URL: https://issues.apache.org/jira/browse/HIVE-25713
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


when this test passes it passes under 100ms - but when it fails it keeps 
waiting or more than 10 seconds - the test seem to be using singal/await 

http://ci.hive.apache.org/job/hive-flaky-check/462/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25712) Fix test TestContribCliDriver#testCliDriver[url_hook]

2021-11-16 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25712:
---

 Summary: Fix test TestContribCliDriver#testCliDriver[url_hook]
 Key: HIVE-25712
 URL: https://issues.apache.org/jira/browse/HIVE-25712
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


The test makes use of SampleURLHook - which could change the JDO url
http://ci.hive.apache.org/job/hive-flaky-check/460/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25711) Make Table#isEmpty more efficient

2021-11-16 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25711:
---

 Summary: Make Table#isEmpty more efficient
 Key: HIVE-25711
 URL: https://issues.apache.org/jira/browse/HIVE-25711
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


[~stevel] suggested in another ticket that we could make our isEmpty method 
faster:

https://issues.apache.org/jira/browse/HIVE-24849?focusedCommentId=17372145&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17372145




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25707) SchemaTool may leave the metastore in-between upgrade steps

2021-11-16 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25707:
---

 Summary: SchemaTool may leave the metastore in-between upgrade 
steps
 Key: HIVE-25707
 URL: https://issues.apache.org/jira/browse/HIVE-25707
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


it seems like:
* schematool runs the sql files via beeline
* autocommit is turned on
* pressing ctrl+c or killing the process will result in an invalid schema

https://github.com/apache/hive/blob/6e02f6164385a370ee8014c795bee1fa423d7937/beeline/src/java/org/apache/hive/beeline/schematool/HiveSchemaTool.java#L79



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25703) Postgres metastore test failures

2021-11-16 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25703:
---

 Summary: Postgres metastore test failures
 Key: HIVE-25703
 URL: https://issues.apache.org/jira/browse/HIVE-25703
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


all recent builds are failing because postgres metastore don't start

underlying issue is that the docker container can't start because of:
```
ls: cannot access '/docker-entrypoint-initdb.d/': Operation not permitted
```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HIVE-25692) ExceptionHandler may mask checked exceptions

2021-11-12 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25692:
---

 Summary: ExceptionHandler may mask checked exceptions
 Key: HIVE-25692
 URL: https://issues.apache.org/jira/browse/HIVE-25692
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


HIVE-25055 have changed the way exceptions as rethrowed - but one of the 
methods may let checked exception out without them being declared on the method 
(and avoid the compile time error for it)

testcase for:
org.apache.hadoop.hive.metastore.TestExceptionHandler

{code}
  @Test
  public void testInvalid() throws MetaException {
try {
  throw new IOException("IOException test");
} catch (Exception e) {
  throw handleException(e).throwIfInstance(AccessControlException.class, 
IOException.class).defaultMetaException();
}
  }
{code}

this testcase should not compile - as it may throw IOException or 
AccessControlException as well



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Category-X JDBC drivers in Hive modules

2021-11-12 Thread Zoltan Haindrich

Hey Stamatis!

Makes sense to me; I think we already have all of the jdbc drivers in the test 
scope - but adding runtime is a great idea!

I had some memories about some letter that we are using Cat-X stuff in Hive and 
we should remove it - I think HIVE-23284 was opened in response to that.
However...if that comes back after these changes we may ask to update the 
scanner because we only use it in test runtime.

cheers,
Zoltan

On 11/10/21 11:59 AM, Stamatis Zampetakis wrote:

Hi all,

Currently, we have some (MariadDB, MySQL, Oracle) Category-X [1] JDBC
drivers in some parts of the project. Sometimes they are included using the
dependency section with test and some others by relying on
download-maven-plugin [2].

Using test scope is kind of OK but it comes with the risk that we may write
code which needs JDBC driver classes in order to compile and this could be
seen as a violation of the AL2 when the Hive source code is released. From
my understanding, the use of download-maven-plugin, first introduced in
HIVE-23284 [3], was an attempt to remedy this problem. Now it comes back
since we started using the test scope again.

We have a few other drivers, namely Postgres, MSSQL, in test scope but are
less important since they have BSD-2 and MIT licenses which are not
problematic.

I would expect that in the context of Hive *all* the JDBC drivers should be
declared using the runtime. This would remove the need to
use the download-maven-plugin and would simplify the inclusion of drivers
in the build. We are not risking to create derivatives of GPL work since
the dependency is not present at compilation so we cannot really use the
respective classes in our code.

Moreover, driver dependencies could be marked optional, which is actually
true, and that would solve any potential licensing issues [4].

I would like to propose to use the following declaration for all JDBC
drivers no matter the license.


   org.mariadb.jdbc
   mariadb-java-client
${mariadb.version}
runtime
true


This will make things more uniform, solve any potential licensing issues,
and when in the future someone copy-pastes dependencies to include new
drivers there will be no violation of AL2.

What do you think?

Best,
Stamatis

[1] https://www.apache.org/legal/resolved.html#category-x
[2]
https://search.maven.org/artifact/com.googlecode.maven-download-plugin/download-maven-plugin/1.6.1/jar
[3] https://issues.apache.org/jira/browse/HIVE-23284
[4] https://www.apache.org/legal/resolved.html#optional



[jira] [Created] (HIVE-25634) Eclipse compiler bumps into AIOBE during ObjectStore compilation

2021-10-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25634:
---

 Summary: Eclipse compiler bumps into AIOBE during ObjectStore 
compilation
 Key: HIVE-25634
 URL: https://issues.apache.org/jira/browse/HIVE-25634
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


this issue seem to have started appearing after HIVE-23633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25633) Prevent shutdown of MetaStore scheduled worker ThreadPool

2021-10-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25633:
---

 Summary: Prevent shutdown of MetaStore scheduled worker ThreadPool
 Key: HIVE-25633
 URL: https://issues.apache.org/jira/browse/HIVE-25633
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


[~lpinter] have noticed that this patch has some sideffect:

in HIVE-23164 the patch have added a {{ThreadPool#shutdown}} to 
{{HMSHandler#shutdown}} - which could cause trouble in case a {{HMSHandler}} is 
shutdown and a new one is created

I was looking for cases in which a HMSHandler is created inside the metastore 
(beyond the one HiveMetaStore is using) - and I think tasks like Msck use it to 
access the metastore - and they close the client - which closes the hmshandler 
client ; which will shut down the threadpool




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25630) Translator fixes

2021-10-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25630:
---

 Summary: Translator fixes
 Key: HIVE-25630
 URL: https://issues.apache.org/jira/browse/HIVE-25630
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


there are some issues:
* AlreadyExistsException might be suppressed by the translator
* uppercase letter usage may cause problems for some clients
* add a way to suppress location checks for legacy clients




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25569) Enable table definition over a single file

2021-09-28 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25569:
---

 Summary: Enable table definition over a single file
 Key: HIVE-25569
 URL: https://issues.apache.org/jira/browse/HIVE-25569
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


Suppose there is a directory where multiple files are present - and by a 3rd 
party database system this is perfectly normal - because its treating a single 
file as the contents of the table.

Tables defined in the metastore follow a different principle - tables are 
considered to be under a directory - and all files under that directory are the 
contents of that directory.

To enable seamless migration/evaluation of Hive and other databases using HMS 
as a metadatabackend the ability to define a table over a single file would be 
usefull.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Delays in precommit runs

2021-09-21 Thread Zoltan Haindrich
Hey All,

I've merged a change to enable branch indexing on the ci job - this will enable 
it to autocleanup old builds and it will also make sure it will start runs even 
in case the github event is lost.
As a side effect of this it rediscovered almost all PRs - i've aborted all of 
the runs which already had a green run...but left the others running.
So, right now it is busy running those tests...if I count it correctly; it 
still has around 25 to go...
it would have been better to wait until the weekend with this...sorry for the 
holdup; I think it will get better by tomorrow.

cheers,
Zoltan

Re: hive-exec vs. hive-exec:core

2021-09-16 Thread Zoltan Haindrich

Hey

On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:

Indeed this may lead to binary incompatibility problems as the one you
mentioned. If I understood correctly the problem you cite comes up if
library B in this case is not relocated. If Hive systematically relocates
shaded deps do you think there will still be binary incompatibility issues?

If the relocating solution works, I would personally prefer going down this
path instead of introducing an entirely new module just for the sake of
dependency management. Most of the time when there are problems with
shading the answer comes from relocating the problematic dependencies and
people are more or less accustomed with this route.


I totally agree with you Stamatis - with the addition that we should work together with the owners of other projects to help them use the correct artifact to gain access to 
Hive's internal parts.

I've opened HIVE-25531 to remove the core classified artifact - and ensure that 
we will be uncovering and fixing future issues with the hive-exec artifact.

cheers,
Zoltan




Best,
Stamatis

On Mon, Aug 30, 2021 at 9:49 PM Daniel Fritsi 
wrote:


Dear Hive developers,

I am Dan from the Oozie team and I would like to bring up the
hive-exec.jar vs. hive-exec-core.jar topic.
The reason for that is because as far as we understand the official
recommendation from the Hive team is to use the hive-exec.jar artifact.

However in Oozie that can end-up in a binary incompatibility.

The reason for that is:

   * Let's say library A is included in the fat Jar.

   * And library B which is using library A is also included in the fat Jar.

   * Let's also say that library A's com.library.alib package is
 relocated to org.apache.hive.com.library.alib,
 meaning the com.library.alib.SomeClass becomes
 org.apache.hive.com.library.alib.SomeClass

   * So if B has a method like public void
 someMethod(com.library.alib.SomeClass) then the signature of this
 method will be changed to:
 public void someMethod(org.apache.hive.com.library.alib.SomeClass)

   * If Oozie is also using B directly meaning we'll have b.jar on our
 classpath, but with the unchanged signature,
 so when hive-exec tries to invoke someMethod then depending on
 whether b.jar coming from us will be loaded first or hive-exec will,
 we can end-up with a NoSuchMethodError is hive-exec tries to pass an
 org.apache.hive.com.library.alib.SomeClass instance to the
 someMethod which was loaded from the original b.jar.

Hence in Oozie a long time ago (OOZIE-2621
) the decision was
made to use the hive-exec-core Jar.

Now since the shading process actually removes those dependencies from
the hive-exec pom which are included in the fat Jar, we manually had to
add some dependencies to Oozie to compensate this.
However these dependencies are not used by Oozie directly and with the
growing features of hive-exec we had to repeat the same process
over-and-over which is a bit unmaintainable.

Today I'm writing to you to propose a long-term solution where basically
nothing would change in the generated hive artifacts, poms and the same
time we wouldn't have to manually declare dependencies in Oozie which
are not explicitly used by us.

The solution:

  1. We would create a new module named hive-exec-dependencies which
 would be a pom-packaging module without any Java source files.
  2. All the dependencies declared in hive-exec would be moved to
 hive-exec-dependencies.
  3. We would make the hive-exec-dependencies module the parent of
 hive-exec and with this hive-exec would still have access to the
 same dependencies as before.
  4. The maven shade plugin would still strip the dependencies from the
 generated hive-exec pom which are included in the fat Jar.
  5. And with a small maven plugin we'd change hive-exec's parent back
 from hive-exec-dependencies to the root hive project in the
 generated hive-exec pom file.

I have a change ready locally and it works as described above.

With this on the Oozie side we could add a dependency on
hive-exec-dependencies and hence all the required libraries which are
included in the fat Jar would be pulled into Oozie.
The next time a new dependency would be added to hive-exec-dependencies,
the Oozie build would pull it in automatically without us having to
explicitly declare it.

Please let me know what you think.

Best,
Dan





[jira] [Created] (HIVE-25531) Remove the core classified hive-exec artifact

2021-09-16 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25531:
---

 Summary: Remove the core classified hive-exec artifact
 Key: HIVE-25531
 URL: https://issues.apache.org/jira/browse/HIVE-25531
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


* this artifact was introduced in HIVE-7423 
* loading this artifact and the shaded hive-exec (along with the jdbc driver) 
could create interesting classpath problems
* if other projects have issues with the shaded hive-exec artifact we must 
start fix those problems



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25508) Partitioned tables created with CTAS queries doesnt have lineage informations

2021-09-09 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25508:
---

 Summary: Partitioned tables created with CTAS queries doesnt have 
lineage informations
 Key: HIVE-25508
 URL: https://issues.apache.org/jira/browse/HIVE-25508
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25485) Transform selects of literals under a UNION ALL to inline table scan

2021-08-26 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25485:
---

 Summary: Transform selects of literals under a UNION ALL to inline 
table scan
 Key: HIVE-25485
 URL: https://issues.apache.org/jira/browse/HIVE-25485
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich



{code}
select 1
union all
select 1
union all
[...]
union all
select 1
{code}

results in a very big plan; which will have vertexes proportional to the number 
of union all branch - hence it could be slow to execute it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25404) Inserts inside merge statements are rewritten incorrectly for partitioned tables

2021-07-29 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25404:
---

 Summary: Inserts inside merge statements are rewritten incorrectly 
for partitioned tables
 Key: HIVE-25404
 URL: https://issues.apache.org/jira/browse/HIVE-25404
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


{code}
drop table u;drop table t;

create table t(value string default 'def') partitioned by (id integer);
create table u(id integer);
{code}

#1 id&value specified
rewritten
{code}
FROM
  `default`.`t`
  RIGHT OUTER JOIN
  `default`.`u`
  ON `t`.`id`=`u`.`id`
INSERT INTO `default`.`t` (`id`,`value`) partition (`id`)-- insert clause
  SELECT `u`.`id`,'x'
   WHERE `t`.`id` IS NULL
{code}
it should be
{code}
[...]
INSERT INTO `default`.`t` partition (`id`) (`value`)-- insert clause
[...]
{code}

#2 when values is not specified

{code}
merge into t using u on t.id=u.id when not matched then insert (id) values 
(u.id);
{code}

rewritten query:
{code}
FROM
  `default`.`t`
  RIGHT OUTER JOIN
  `default`.`u`
  ON `t`.`id`=`u`.`id`
INSERT INTO `default`.`t` (`id`) partition (`id`)-- insert clause
  SELECT `u`.`id`
   WHERE `t`.`id` IS NULL
{code}

it should be
{code}
[...]
INSERT INTO `default`.`t` partition (`id`) ()-- insert clause
[...]
{code}

however we don't accept empty column lists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25395) Update hadoop to a more recent version

2021-07-27 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25395:
---

 Summary: Update hadoop to a more recent version
 Key: HIVE-25395
 URL: https://issues.apache.org/jira/browse/HIVE-25395
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


we are still depending on hadoop 3.1.0

which doesn't have source attachments - and makes development harder



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25378) Enable removal of old builds on hive ci

2021-07-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25378:
---

 Summary: Enable removal of old builds on hive ci
 Key: HIVE-25378
 URL: https://issues.apache.org/jira/browse/HIVE-25378
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


We are using the github plugin to run builds on PRs

However to remove old builds that plugin needs to have periodic branch scanning 
enabled - however since we also use the plugins merge mechanism; this will 
cause to rediscover all open PRs after there is a new commit on the target 
branch. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25370) Improve SharedWorkOptimizer performance

2021-07-22 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25370:
---

 Summary: Improve SharedWorkOptimizer performance
 Key: HIVE-25370
 URL: https://issues.apache.org/jira/browse/HIVE-25370
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


for queries which are unioning ~800 constant rows the SWO is doing around n*n/2 
operations trying to find 2 TS-es which could be merged

{code}
select constants
UNION ALL
...
UNION ALL
select constants
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25313) Upgrade commons-codec to 1.15

2021-07-07 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25313:
---

 Summary: Upgrade commons-codec to 1.15
 Key: HIVE-25313
 URL: https://issues.apache.org/jira/browse/HIVE-25313
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25312) Upgrade netty to 4.1.65.Final

2021-07-07 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25312:
---

 Summary: Upgrade netty to 4.1.65.Final
 Key: HIVE-25312
 URL: https://issues.apache.org/jira/browse/HIVE-25312
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25311) Slow compilation of union operators with >100 branches

2021-07-07 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25311:
---

 Summary: Slow compilation of union operators with >100 branches
 Key: HIVE-25311
 URL: https://issues.apache.org/jira/browse/HIVE-25311
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


during the processing of an N way union operator the full plan is cloned N 
times; which might hurt compilation time performance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25290) Stabilize TestTxnHandler

2021-06-25 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25290:
---

 Summary: Stabilize TestTxnHandler
 Key: HIVE-25290
 URL: https://issues.apache.org/jira/browse/HIVE-25290
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


http://ci.hive.apache.org/job/hive-flaky-check/271/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25289) Fix external_jdbc_table3 and external_jdbc_table4

2021-06-25 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25289:
---

 Summary: Fix external_jdbc_table3 and external_jdbc_table4
 Key: HIVE-25289
 URL: https://issues.apache.org/jira/browse/HIVE-25289
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


http://ci.hive.apache.org/job/hive-flaky-check/265/
http://ci.hive.apache.org/job/hive-flaky-check/266/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25288) Fix TestMmCompactorOnTez

2021-06-25 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25288:
---

 Summary: Fix TestMmCompactorOnTez
 Key: HIVE-25288
 URL: https://issues.apache.org/jira/browse/HIVE-25288
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


http://ci.hive.apache.org/job/hive-flaky-check/240/





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25285) Retire HiveProjectJoinTransposeRule

2021-06-24 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25285:
---

 Summary: Retire HiveProjectJoinTransposeRule
 Key: HIVE-25285
 URL: https://issues.apache.org/jira/browse/HIVE-25285
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


we don't neccessary need our own rule anymore - a plain 
ProjectJoinTransposeRule  could probably work





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25278) HiveProjectJoinTransposeRule may do invalid transformations with windowing expressions

2021-06-23 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25278:
---

 Summary: HiveProjectJoinTransposeRule may do invalid 
transformations with windowing expressions 
 Key: HIVE-25278
 URL: https://issues.apache.org/jira/browse/HIVE-25278
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


running
{code}
create table table1 (acct_num string, interest_rate decimal(10,7)) stored as 
orc;
create table table2 (act_id string) stored as orc;
CREATE TABLE temp_output AS
SELECT act_nbr, row_num
FROM (SELECT t2.act_id as act_nbr,
row_number() over (PARTITION BY trim(acct_num) ORDER BY interest_rate DESC) AS 
row_num
FROM table1 t1
INNER JOIN table2 t2
ON trim(acct_num) = t2.act_id) t
WHERE t.row_num = 1;
{code}

may result in error like:

{code}
Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 
Invalid column reference 'interest_rate': (possible column names are: 
interest_rate, trim) (state=42000,code=4)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25267) Fix TestReplicationScenariosAcidTables

2021-06-18 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25267:
---

 Summary: Fix TestReplicationScenariosAcidTables
 Key: HIVE-25267
 URL: https://issues.apache.org/jira/browse/HIVE-25267
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


test is unstable
http://ci.hive.apache.org/job/hive-flaky-check/242/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25266) Fix TestWarehouseExternalDir

2021-06-18 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25266:
---

 Summary: Fix TestWarehouseExternalDir
 Key: HIVE-25266
 URL: https://issues.apache.org/jira/browse/HIVE-25266
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


test is unstable 
http://ci.hive.apache.org/job/hive-flaky-check/244/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25265) Fix TestHiveIcebergStorageHandlerWithEngine

2021-06-18 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25265:
---

 Summary: Fix TestHiveIcebergStorageHandlerWithEngine
 Key: HIVE-25265
 URL: https://issues.apache.org/jira/browse/HIVE-25265
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


test is unstable:
http://ci.hive.apache.org/job/hive-flaky-check/251/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25250) Fix TestHS2ImpersonationWithRemoteMS.testImpersonation

2021-06-15 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25250:
---

 Summary: Fix TestHS2ImpersonationWithRemoteMS.testImpersonation
 Key: HIVE-25250
 URL: https://issues.apache.org/jira/browse/HIVE-25250
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


http://ci.hive.apache.org/job/hive-flaky-check/235/testReport/org.apache.hive.service/TestHS2ImpersonationWithRemoteMS/testImpersonation/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25249) Fix TestWorker

2021-06-15 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25249:
---

 Summary: Fix TestWorker
 Key: HIVE-25249
 URL: https://issues.apache.org/jira/browse/HIVE-25249
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich



http://ci.hive.apache.org/job/hive-precommit/job/PR-2381/1/

http://ci.hive.apache.org/job/hive-flaky-check/236/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25248) Fix .TestLlapTaskSchedulerService#testForcedLocalityMultiplePreemptionsSameHost1

2021-06-15 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25248:
---

 Summary: Fix 
.TestLlapTaskSchedulerService#testForcedLocalityMultiplePreemptionsSameHost1
 Key: HIVE-25248
 URL: https://issues.apache.org/jira/browse/HIVE-25248
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


This test is failing randomly recently

http://ci.hive.apache.org/job/hive-flaky-check/233/testReport/org.apache.hadoop.hive.llap.tezplugins/TestLlapTaskSchedulerService/testForcedLocalityMultiplePreemptionsSameHost1/





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25247) Fix TestWMMetricsWithTrigger

2021-06-15 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25247:
---

 Summary: Fix TestWMMetricsWithTrigger
 Key: HIVE-25247
 URL: https://issues.apache.org/jira/browse/HIVE-25247
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


this test seems to be unstable:

http://ci.hive.apache.org/job/hive-flaky-check/226/

it was introduced by HIVE-24803 a few months ago 

cc: [~gupta.nikhil0007]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25224) Multi insert statements involving tables with different bucketing_versions results in error

2021-06-09 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25224:
---

 Summary: Multi insert statements involving tables with different 
bucketing_versions results in error
 Key: HIVE-25224
 URL: https://issues.apache.org/jira/browse/HIVE-25224
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich



{code}
drop table if exists t;
drop table if exists t2;
drop table if exists t3;
create table t (a integer);
create table t2 (a integer);
create table t3 (a integer);
alter table t set tblproperties ('bucketing_version'='1');
explain from t3 insert into t select a insert into t2 select a;
{code}

results in
{code}
Error: Error while compiling statement: FAILED: RuntimeException Error setting 
bucketingVersion for group: [[op: FS[2], bucketingVersion=1], [op: FS[11], 
bucketingVersion=2]] (state=42000,code=4)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25180) Update netty to 4.1.60.Final

2021-05-31 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25180:
---

 Summary: Update netty to 4.1.60.Final
 Key: HIVE-25180
 URL: https://issues.apache.org/jira/browse/HIVE-25180
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25171) Use ACID_HOUSEKEEPER_SERVICE_START

2021-05-27 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25171:
---

 Summary: Use ACID_HOUSEKEEPER_SERVICE_START
 Key: HIVE-25171
 URL: https://issues.apache.org/jira/browse/HIVE-25171
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich


seems to be unused right now



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25138) Auto disable scheduled queries after repeated failures

2021-05-19 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25138:
---

 Summary: Auto disable scheduled queries after repeated failures
 Key: HIVE-25138
 URL: https://issues.apache.org/jira/browse/HIVE-25138
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25044) Parallel edge fixer may not be able to process semijoin edges

2021-04-21 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25044:
---

 Summary: Parallel edge fixer may not be able to process semijoin 
edges
 Key: HIVE-25044
 URL: https://issues.apache.org/jira/browse/HIVE-25044
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


SJ filter edges are removed from the main operator graph - which could cause 
that a parallel edge remains after the remover was executed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25036) Unstable testcase script_broken_pipe2

2021-04-20 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25036:
---

 Summary: Unstable testcase script_broken_pipe2
 Key: HIVE-25036
 URL: https://issues.apache.org/jira/browse/HIVE-25036
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich


http://ci.hive.apache.org/job/hive-flaky-check/224/

{code}
Client Execution succeeded but contained differences (error code = 1) after 
executing script_broken_pipe2.q 
24c24
< Caused by: java.io.IOException: Broken pipe
---
> Caused by: java.io.IOException: Stream closed
46c46
< Caused by: java.io.IOException: Broken pipe
---
> Caused by: java.io.IOException: Stream closed
49,58d48
< FAILED: AssertionError java.lang.AssertionError: Client Execution succeeded 
but contained differences (error code = 1) after executing 
script_broken_pipe2.q 
< 24c24
< < Caused by: java.io.IOException: Broken pipe
< ---
< > Caused by: java.io.IOException: Stream closed
< 46c46
< < Caused by: java.io.IOException: Broken pipe
< ---
< > Caused by: java.io.IOException: Stream closed
< 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-25029) Remove travis builds

2021-04-19 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-25029:
---

 Summary: Remove travis builds
 Key: HIVE-25029
 URL: https://issues.apache.org/jira/browse/HIVE-25029
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


travis only compiles the project - we already do much more than that during 
precommit testing.
(and it it sometimes delays build because travis cant allocate executors/etc)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24986) Support aggregates on columns present in rollups

2021-04-07 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24986:
---

 Summary: Support aggregates on columns present in rollups
 Key: HIVE-24986
 URL: https://issues.apache.org/jira/browse/HIVE-24986
 Project: Hive
  Issue Type: Improvement
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


{code}
SELECT key, value, count(key) FROM src GROUP BY key, value with rollup;
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


merged development branch with a bunch of commits

2021-04-06 Thread Zoltan Haindrich

Hey All!

Seems like a changeset was merged without being squashed first...now we have a 
changeset in 38 commits...
I wanted to fix this - but I was not able to force-push the master branch to 
properly merge https://github.com/apache/hive/pull/2037

https://github.com/apache/hive/commits/master?before=2eb0e00d5d614d3144519cf4861ec1759a373c7d+35&branch=master

What do we want to do with this?

cheers,
Zoltan


[jira] [Created] (HIVE-24979) Tests should not load confs from places like /etc/hive/hive-site.xml

2021-04-06 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24979:
---

 Summary: Tests should not load confs from places like 
/etc/hive/hive-site.xml
 Key: HIVE-24979
 URL: https://issues.apache.org/jira/browse/HIVE-24979
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


for example: 
TestEmbeddedHiveMetaStore

may load a value for the metastore.metadata.transformer.class key from 
/etc/hive/hive-site.xml





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24963) Windowing expression may loose its input in some cases

2021-03-31 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24963:
---

 Summary: Windowing expression may loose its input in some cases
 Key: HIVE-24963
 URL: https://issues.apache.org/jira/browse/HIVE-24963
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich


{code}
drop table if exists sss;
 CREATE TABLE `sss`(
   `user_id` bigint,
   `user_mid` string
 )
 PARTITIONED BY (
   `dt` string)
STORED AS ORC
   ;

insert into sss partition(dt='part1') VALUES (12345,'user_mid 
v1'),(12345,'user_mid v1'),(12345,'user_mid v1'),(12345,'user_mid 
v1'),(12345,'user_mid v1');


set hive.auto.convert.join.noconditionaltask.size=1;
WITH
 unioned_user AS (
 SELECT
 *,
 row_number() OVER (PARTITION BY user_mid ORDER BY dt ASC) AS r_asc,
 row_number() OVER (PARTITION BY user_mid ORDER BY dt DESC) AS 
r_desc
 FROM (
 SELECT DISTINCT
 dt,
 user_mid
 FROM sss
 WHERE dt = '20210228'
 UNION ALL
 SELECT DISTINCT
dt,
 user_mid
 FROM sss
 ) AS uni
 ),
 merged_user AS (
 SELECT
 a.user_mid
 FROM (SELECT * FROM unioned_user WHERE r_asc = 1) AS a
 INNER JOIN (SELECT * FROM unioned_user WHERE r_desc = 1) AS d
 ON a.user_mid = d.user_mid
 )
 Select count(*) from merged_user;
{cdode}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-24954) MetastoreTransformer is disabled during testing

2021-03-29 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-24954:
---

 Summary: MetastoreTransformer is disabled during testing
 Key: HIVE-24954
 URL: https://issues.apache.org/jira/browse/HIVE-24954
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich



all calls are fortified with "isInTest" guards to avoid testing those calls 
(!@#$#)

https://github.com/apache/hive/blob/86fa9b30fe347c7fc78a2930f4d20ece2e124f03/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java#L1647

this causes some wierd behaviour:
out of the box hive installation creates TRANSLATED_TO_EXTERNAL external tables 
for plain CREATE TABLE commands
meanwhile during when most testing is executed CREATE table creates regular 
MANAGED tables...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >