Re: [DISCUSS] Nightly snaphot builds

2023-05-24 Thread vihang karajgaonkar
I created https://issues.apache.org/jira/browse/HIVE-27371 to have nightly
builds for branch-3. Once that is merged, I think we can have scheduled
builds for branch-3 as well. Although, I don't have permissions to create a
new job for branch-3. Does anyone know how to do it?

Thanks,
Vihang

On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar 
wrote:

> The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great. Can
> we have this for branch-3 as well since we have been backporting a lot of
> PRs to branch-3 lately.
>
> Thanks,
> Vihang
>
>
>
>
>
> On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich  wrote:
>
>> Hey,
>>
>>  > We already have nightly builds for Hive [1].
>>  > [1] http://ci.hive.apache.org/job/hive-nightly/
>>
>> ...and hive-dev-box can launch such archives; either by using it like
>> this:
>> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
>>
>> or with a somewhat longer command you could launch hdb in bazaar mode;
>> and have an HS2 running with a nightly version:
>>
>> docker run --rm -d -p 1:1 -v hive-dev-box_work:/work -e
>> HIVE_VERSION=
>> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
>> --name hive
>> kgyrtkirk/hive-dev-box:bazaar
>>
>> cheers,
>> Zoltan
>>
>> On 5/24/23 09:15, Stamatis Zampetakis wrote:
>> > Hey all,
>> >
>> > We already have nightly builds for Hive [1].
>> >
>> > Do we need something more than that?
>> >
>> > Best,
>> > Stamatis
>> >
>> > [1] http://ci.hive.apache.org/job/hive-nightly/
>> >
>> >
>> > On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <
>> vihan...@apache.org> wrote:
>> >>
>> >> I think there are many benefits like others in this thread suggested
>> which
>> >> can be built on top of nightly builds. Having docker images is great
>> but
>> >> for now I think we can start simple and publish the jars. Many users
>> still
>> >> just deploy using jars and it would be useful to them. Once we have a
>> >> docker environment we can add a docker image too to the nightly builds
>> so
>> >> that users can choose their preferred way.
>> >>
>> >> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park 
>> wrote:
>> >>
>> >>> I think such nightly builds will be useful for testing and debugging
>> in the
>> >>> future.
>> >>>
>> >>> I also wonder if we can somehow create builds even from previous
>> commits
>> >>> (e.g., for the past few years). Such builds from previous commits
>> don't
>> >>> have to be daily builds, and I think weekly builds (or even monthly
>> builds)
>> >>> would also be very useful.
>> >>>
>> >>> The reason I wish such builds were available is to facilitate
>> debugging and
>> >>> testing. When tested against the TPC-DS benchmark, the current master
>> >>> branch has several correctness problems that were introduced after the
>> >>> release of Hive 3.1.2. We have reported all problems known to us in
>> [1] and
>> >>> also submitted several patches. If such nightly builds had been
>> available,
>> >>> we would have saved quite a bit of time for implementing the patches
>> by
>> >>> quickly finding offending commits that introduced new correctness
>> bugs.
>> >>>
>> >>> In addition, you can find quite a few commits in the master branch
>> that
>> >>> report bugs which are not reproduced in Hive 3.1.2. Examples:
>> HIVE-19990,
>> >>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
>> >>> HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
>> >>> HIVE-25170, HIVE-25864, HIVE-26671.
>> >>> (There may be some errors in this list because we compared against
>> Hive
>> >>> 3.1.2 with many patches backported.) Such nightly builds can be
>> useful for
>> >>> finding root causes of such bugs.
>> >>>
>> >>> Ideally I wish there was an automated procedure to create nightly
>> builds,
>> >>> run TPC-DS benchmark, and report correctness/performance results,
>> although
>> >>> this would be quite hard to implement. (I remember Spark implemented
>> this
>> >>> procedure in the era of Spark 2, but my memory could be wrong.)
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/HIVE-26654
>> >>>
>> >>>
>> >>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena 
>> wrote:
>> >>>
>>  Hi Vihang,
>>  +1, We were even exploring publishing the docker images of the
>> snapshot
>>  version as well per commit or maybe weekly, so just shoot 2 docker
>> >>> commands
>>  and you get a Hive cluster running with master code.
>> 
>>  Sai, I think to spin up an env via Docker with all these things
>> should be
>>  doable for sure, but would require someone with real good expertise
>> with
>>  docker as well as setting up these services with Hive. Obviously, I
>> am
>> >>> not
>>  that guy :-)
>> 
>>  @Simhadri has a PR which publishes docker images once a release tag
>> is
>>  pushed, you can explore to have similar stuff for the Snapshot
>> version,
>>  maybe if that sounds cool
>> 

Re: [DISCUSS] Nightly snaphot builds

2023-05-24 Thread vihang karajgaonkar
The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great. Can
we have this for branch-3 as well since we have been backporting a lot of
PRs to branch-3 lately.

Thanks,
Vihang





On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich  wrote:

> Hey,
>
>  > We already have nightly builds for Hive [1].
>  > [1] http://ci.hive.apache.org/job/hive-nightly/
>
> ...and hive-dev-box can launch such archives; either by using it like this:
> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
>
> or with a somewhat longer command you could launch hdb in bazaar mode; and
> have an HS2 running with a nightly version:
>
> docker run --rm -d -p 1:1 -v hive-dev-box_work:/work -e
> HIVE_VERSION=
> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
> --name hive
> kgyrtkirk/hive-dev-box:bazaar
>
> cheers,
> Zoltan
>
> On 5/24/23 09:15, Stamatis Zampetakis wrote:
> > Hey all,
> >
> > We already have nightly builds for Hive [1].
> >
> > Do we need something more than that?
> >
> > Best,
> > Stamatis
> >
> > [1] http://ci.hive.apache.org/job/hive-nightly/
> >
> >
> > On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar 
> wrote:
> >>
> >> I think there are many benefits like others in this thread suggested
> which
> >> can be built on top of nightly builds. Having docker images is great but
> >> for now I think we can start simple and publish the jars. Many users
> still
> >> just deploy using jars and it would be useful to them. Once we have a
> >> docker environment we can add a docker image too to the nightly builds
> so
> >> that users can choose their preferred way.
> >>
> >> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park 
> wrote:
> >>
> >>> I think such nightly builds will be useful for testing and debugging
> in the
> >>> future.
> >>>
> >>> I also wonder if we can somehow create builds even from previous
> commits
> >>> (e.g., for the past few years). Such builds from previous commits don't
> >>> have to be daily builds, and I think weekly builds (or even monthly
> builds)
> >>> would also be very useful.
> >>>
> >>> The reason I wish such builds were available is to facilitate
> debugging and
> >>> testing. When tested against the TPC-DS benchmark, the current master
> >>> branch has several correctness problems that were introduced after the
> >>> release of Hive 3.1.2. We have reported all problems known to us in
> [1] and
> >>> also submitted several patches. If such nightly builds had been
> available,
> >>> we would have saved quite a bit of time for implementing the patches by
> >>> quickly finding offending commits that introduced new correctness bugs.
> >>>
> >>> In addition, you can find quite a few commits in the master branch that
> >>> report bugs which are not reproduced in Hive 3.1.2. Examples:
> HIVE-19990,
> >>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> >>> HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> >>> HIVE-25170, HIVE-25864, HIVE-26671.
> >>> (There may be some errors in this list because we compared against Hive
> >>> 3.1.2 with many patches backported.) Such nightly builds can be useful
> for
> >>> finding root causes of such bugs.
> >>>
> >>> Ideally I wish there was an automated procedure to create nightly
> builds,
> >>> run TPC-DS benchmark, and report correctness/performance results,
> although
> >>> this would be quite hard to implement. (I remember Spark implemented
> this
> >>> procedure in the era of Spark 2, but my memory could be wrong.)
> >>>
> >>> [1] https://issues.apache.org/jira/browse/HIVE-26654
> >>>
> >>>
> >>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena 
> wrote:
> >>>
>  Hi Vihang,
>  +1, We were even exploring publishing the docker images of the
> snapshot
>  version as well per commit or maybe weekly, so just shoot 2 docker
> >>> commands
>  and you get a Hive cluster running with master code.
> 
>  Sai, I think to spin up an env via Docker with all these things
> should be
>  doable for sure, but would require someone with real good expertise
> with
>  docker as well as setting up these services with Hive. Obviously, I am
> >>> not
>  that guy :-)
> 
>  @Simhadri has a PR which publishes docker images once a release tag is
>  pushed, you can explore to have similar stuff for the Snapshot
> version,
>  maybe if that sounds cool
> 
>  -Ayush
> 
>  On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
>   wrote:
> 
> > Hi Vihang,
> >
> > +1 on the idea.
> >
> > This is a great idea to quickly test if a certain feature is working
> as
> > expected on a certain branch.
> > This way we test data loss, correctness, or any other unexpected
>  scenarios
> > that are Hive specific only. However, I'm wondering if it is possible
> >>> to
> > deploy/test in a kerberized environment or issues involving
> >>> 

Re: [DISCUSS] Nightly snaphot builds

2023-05-24 Thread Zoltan Haindrich

Hey,

> We already have nightly builds for Hive [1].
> [1] http://ci.hive.apache.org/job/hive-nightly/

...and hive-dev-box can launch such archives; either by using it like this:
https://www.mail-archive.com/dev@hive.apache.org/msg142420.html

or with a somewhat longer command you could launch hdb in bazaar mode; and have 
an HS2 running with a nightly version:

docker run --rm -d -p 1:1 -v hive-dev-box_work:/work -e 
HIVE_VERSION=http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz --name hive 
kgyrtkirk/hive-dev-box:bazaar


cheers,
Zoltan

On 5/24/23 09:15, Stamatis Zampetakis wrote:

Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar  wrote:


I think there are many benefits like others in this thread suggested which
can be built on top of nightly builds. Having docker images is great but
for now I think we can start simple and publish the jars. Many users still
just deploy using jars and it would be useful to them. Once we have a
docker environment we can add a docker image too to the nightly builds so
that users can choose their preferred way.

On Mon, May 22, 2023 at 11:07 PM Sungwoo Park  wrote:


I think such nightly builds will be useful for testing and debugging in the
future.

I also wonder if we can somehow create builds even from previous commits
(e.g., for the past few years). Such builds from previous commits don't
have to be daily builds, and I think weekly builds (or even monthly builds)
would also be very useful.

The reason I wish such builds were available is to facilitate debugging and
testing. When tested against the TPC-DS benchmark, the current master
branch has several correctness problems that were introduced after the
release of Hive 3.1.2. We have reported all problems known to us in [1] and
also submitted several patches. If such nightly builds had been available,
we would have saved quite a bit of time for implementing the patches by
quickly finding offending commits that introduced new correctness bugs.

In addition, you can find quite a few commits in the master branch that
report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
HIVE-25170, HIVE-25864, HIVE-26671.
(There may be some errors in this list because we compared against Hive
3.1.2 with many patches backported.) Such nightly builds can be useful for
finding root causes of such bugs.

Ideally I wish there was an automated procedure to create nightly builds,
run TPC-DS benchmark, and report correctness/performance results, although
this would be quite hard to implement. (I remember Spark implemented this
procedure in the era of Spark 2, but my memory could be wrong.)

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Tue, May 23, 2023 at 10:44 AM Ayush Saxena  wrote:


Hi Vihang,
+1, We were even exploring publishing the docker images of the snapshot
version as well per commit or maybe weekly, so just shoot 2 docker

commands

and you get a Hive cluster running with master code.

Sai, I think to spin up an env via Docker with all these things should be
doable for sure, but would require someone with real good expertise with
docker as well as setting up these services with Hive. Obviously, I am

not

that guy :-)

@Simhadri has a PR which publishes docker images once a release tag is
pushed, you can explore to have similar stuff for the Snapshot version,
maybe if that sounds cool

-Ayush

On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
 wrote:


Hi Vihang,

+1 on the idea.

This is a great idea to quickly test if a certain feature is working as
expected on a certain branch.
This way we test data loss, correctness, or any other unexpected

scenarios

that are Hive specific only. However, I'm wondering if it is possible

to

deploy/test in a kerberized environment or issues involving

authorization

services like sentry/ranger.

Thanks,
Sai.

On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <

vihan...@apache.org>

wrote:


Hello Team,

I have observed that it is a common use-case where users would like

to

test

out unreleased features/bug fixes either to unblock them or test out

if

the

bug fixes really work as intended in their environments. Today in the

case

of Apache Hive, this is not very user friendly because it requires

the

end

user to build the binaries directly from the hive source code.

I found that Apache Spark has a very useful infrastructure [1] which
deploys nightly snapshots [2] [3] from the branch using github

actions.

This is super useful for any user who wants to try out the latest and
greatest using the nightly builds.

I was wondering if we should also adopt this. We can use 

Re: Apache Hive on Twitter

2023-05-24 Thread Stamatis Zampetakis
Thanks for driving this Ayush! It's great to see Hive alive again on twitter.

Best,
Stamatis

On Tue, May 23, 2023 at 3:58 AM Ayush Saxena  wrote:
>
> Hi All,
> I am happy to announce: We have got the Apache Hive Twitter account active
> again or maybe in other words we have got creds to use it now.
>
> The twitter account stays here:
>
> https://twitter.com/ApacheHive
>
> The account belongs to all of us at Hive. As we decided, if anyone wants to
> get anything posted on the Twitter account, related to Apache Hive. He/She
> can drop a mail to the Hive Dev mailing with the request, with a label in
> the subject [Twitter].
>
> For the record as of today, following people have access to post:
>
> Alan Gates, Ayush Saxena, Carl Steinbach, Joydeep Sen Sharma, Owen
> O'Malley, Sushanth Sowmyan, Szehon Ho, Thejas Nair & Vikram Dixit
>
> A note of thanks to Joydeep Sen Sharma, Carl Steinbach, Stamatis Zampetakis
> & Naveen Gangam for helping with the process. Attila Turoczy for the
> initial thoughts/idea.
>
> -Ayush


Re: [DISCUSS] Nightly snaphot builds

2023-05-24 Thread Stamatis Zampetakis
Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar  wrote:
>
> I think there are many benefits like others in this thread suggested which
> can be built on top of nightly builds. Having docker images is great but
> for now I think we can start simple and publish the jars. Many users still
> just deploy using jars and it would be useful to them. Once we have a
> docker environment we can add a docker image too to the nightly builds so
> that users can choose their preferred way.
>
> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park  wrote:
>
> > I think such nightly builds will be useful for testing and debugging in the
> > future.
> >
> > I also wonder if we can somehow create builds even from previous commits
> > (e.g., for the past few years). Such builds from previous commits don't
> > have to be daily builds, and I think weekly builds (or even monthly builds)
> > would also be very useful.
> >
> > The reason I wish such builds were available is to facilitate debugging and
> > testing. When tested against the TPC-DS benchmark, the current master
> > branch has several correctness problems that were introduced after the
> > release of Hive 3.1.2. We have reported all problems known to us in [1] and
> > also submitted several patches. If such nightly builds had been available,
> > we would have saved quite a bit of time for implementing the patches by
> > quickly finding offending commits that introduced new correctness bugs.
> >
> > In addition, you can find quite a few commits in the master branch that
> > report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
> > HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> > HIVE-7, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> > HIVE-25170, HIVE-25864, HIVE-26671.
> > (There may be some errors in this list because we compared against Hive
> > 3.1.2 with many patches backported.) Such nightly builds can be useful for
> > finding root causes of such bugs.
> >
> > Ideally I wish there was an automated procedure to create nightly builds,
> > run TPC-DS benchmark, and report correctness/performance results, although
> > this would be quite hard to implement. (I remember Spark implemented this
> > procedure in the era of Spark 2, but my memory could be wrong.)
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26654
> >
> >
> > On Tue, May 23, 2023 at 10:44 AM Ayush Saxena  wrote:
> >
> > > Hi Vihang,
> > > +1, We were even exploring publishing the docker images of the snapshot
> > > version as well per commit or maybe weekly, so just shoot 2 docker
> > commands
> > > and you get a Hive cluster running with master code.
> > >
> > > Sai, I think to spin up an env via Docker with all these things should be
> > > doable for sure, but would require someone with real good expertise with
> > > docker as well as setting up these services with Hive. Obviously, I am
> > not
> > > that guy :-)
> > >
> > > @Simhadri has a PR which publishes docker images once a release tag is
> > > pushed, you can explore to have similar stuff for the Snapshot version,
> > > maybe if that sounds cool
> > >
> > > -Ayush
> > >
> > > On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> > >  wrote:
> > >
> > > > Hi Vihang,
> > > >
> > > > +1 on the idea.
> > > >
> > > > This is a great idea to quickly test if a certain feature is working as
> > > > expected on a certain branch.
> > > > This way we test data loss, correctness, or any other unexpected
> > > scenarios
> > > > that are Hive specific only. However, I'm wondering if it is possible
> > to
> > > > deploy/test in a kerberized environment or issues involving
> > authorization
> > > > services like sentry/ranger.
> > > >
> > > > Thanks,
> > > > Sai.
> > > >
> > > > On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> > > vihan...@apache.org>
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > I have observed that it is a common use-case where users would like
> > to
> > > > test
> > > > > out unreleased features/bug fixes either to unblock them or test out
> > if
> > > > the
> > > > > bug fixes really work as intended in their environments. Today in the
> > > > case
> > > > > of Apache Hive, this is not very user friendly because it requires
> > the
> > > > end
> > > > > user to build the binaries directly from the hive source code.
> > > > >
> > > > > I found that Apache Spark has a very useful infrastructure [1] which
> > > > > deploys nightly snapshots [2] [3] from the branch using github
> > actions.
> > > > > This is super useful for any user who wants to try out the latest and
> > > > > greatest using the nightly builds.
> > > > >
> > > > > I was wondering if we should also adopt this. We can use github
> > actions
> > > > to
> > > > > upload the snapshot jars to the public repository (e.g github
> > packages)
> > > > and