Commit message guidelines

2021-09-24 Thread Stamatis Zampetakis
Hi all,

I think we all more or less follow some standard pattern when committing in
Hive but with some small effort we could make things more uniform and
hopefully better.
I would like to start a discussion about creating some guidelines, which we
could put to the wiki or in contributing.md, to improve the quality of our
history (git log).
I outline some suggestions below to kick off the discussion. Many things in
the list are minor (and maybe even personal preferences) but one thing
which is really missing from the project is B3 especially the *why* part.
Why is the commit necessary? Why has the change been made?.
In some cases the why part is also missing from the JIRA making the code
harder to maintain.

Subject line:
S1. Start with the Jira id capitalized and followed immediately (no space)
by double colon (:)
S2. Leave one space after the Jira id, and start the summary with a capital
letter
S3. Keep it concise (ideally less than 72 characters) and provide a useful
description of the change
S4. Do not include or end the line with period
S5. Do not include the pull request id in the summary
S6. Use imperative mood (“Add a handler …”) rather than past tense (“Added
a handler …”) or present tense (“Adds a handler …”)
S7. Avoid using "Fix"; If you are fixing a bug, it is sufficient to
describe the bug (“NullPointerException if user is unknown”) and people
will correctly surmise that the purpose of your change is to fix the bug.
S8. Do not add a contributor's name; the author tag is made exactly for
this and can be explored/parsed much more efficiently by tools/people for
stats or other purposes
S9. Do not add reviewers name; information is present in multiple places
(e.g., committer tag, PR, JIRA)

Message body: (Trivial changes may not require a body)
B1. Separate subject from body with a blank line
B2. Wrap the body at 72 characters
B3. Use the body to explain what and why vs. how
Example
"Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan and
Converter to avoid executing the fallback method which in many cases
returns null and can cause NPE when this value propagates up the call
stack."
vs.
"Added handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
and Converter"
B4. If multiple authors include them using the standard GitHub marker,
"Co-authored-by:", followed by the name and email of the author (e.g.,
Co-authored-by: Marton Bod )
B5. If the reviewer is different from committer (or merge via GitHub UI)
use "Reviewed-by:" followed by the name and email of the reviewer (e.g.,
Reviewed-by: Stamatis Zampetakis )
B6. Use "Co-authored-by"/"Reviewed-by" on each own line and repeat as many
times as authors/reviewers.
B7. Include the PR id at the end of the message (e.g., Closes #2514);
someone can easily navigate back to the PR to check comments, reviewers,
etc.

A sample commit message following these guidelines is shown below:

commit de7781f29f82083fe01274b4d436b52920a89173
Author: Soumyakanti Das 
Commit: Stamatis Zampetakis 

HIVE-25354: NPE when estimating row count in external JDBC tables

Add handler methods in HiveRelMdDistictRowCount for JdbcHiveTableScan
and Converter to avoid executing the fallback method which in many
cases returns null and can cause NPE when this value propagates up the
call stack.

Co-authored-by: Krisztian Kasa 

Reviewed-by: Peter Vary 
Reviewed-by: Zoltan Haindrich 

Closes #2514

Let me know your thoughts.

Best,
Stamatis


Re: [EXTERNAL] Raise exception instead of silent change for new DateTimeformatter

2021-10-01 Thread Stamatis Zampetakis
Hi Ashish, Sankar,

I am not sure if you both refer to the same problem.

As far as it concerns reading and writing to Parquet/Avro files the
compatibility issues should be resolved as part of HIVE-25104 [1], and
HIVE-25219 [2].
If I recall correctly we added some config properties to ease migration.

Regarding the UNIX_TIMESTAMP function indeed I remember seeing many JIRA
cases reporting problems. Let's find the relation with HIVE-25576 [3] and
try to address them.
We could opt for a new property but let's continue the discussion in the
respective JIRA case. People who have an opinion about the topic can jump
in there.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25104
[2] https://issues.apache.org/jira/browse/HIVE-25219
[3] https://issues.apache.org/jira/browse/HIVE-25576


On Thu, Sep 30, 2021 at 3:29 PM Sankar Hariappan <
sankar.hariap...@microsoft.com> wrote:

> Hi @Stamatis Zampetakis , @David ,
>
>
>
> Our current implementation using DateTimeFormatter is not backward
> compatible and it leads to migration issues.
>
> One of our customer who have this use-case where we don’t have a better
> options to migrate.
>
>
>
> *Hive 1.2/Spark 2.4 (Shared metastore):*
>
> Set VM time zone to Asia/Bangkok.
>
> INSERT values (“1400-01-01 00:00:00”) into parquet_table; // Here, parquet
> writer converts the data into UTC (- 07:00:00) and stored it.
>
>
>
> *Migrate to Hive 3.x/Spark 3.x (Shared metastore)::*
>
> Set VM time zone to Asia/Bangkok.
>
> SELECT ts from parquet_table; // Hive returns different value whereas
> Spark (spark.sql.legacy.timeParserPolicy=LEGACY) returns 1400-01-01 00:00:00
>
>
>
> It is not easy to change thousands of Hive scripts to handle this
> difference and it adds to migration cost.
>
> I think, it is necessary to enable backward compatibility for smooth
> migration. Pls share your thoughts.
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Ashish Sharma 
> *Sent:* 29 September 2021 19:11
> *To:* dev@hive.apache.org; u...@hive.apache.org
> *Cc:* sank...@apache.org
> *Subject:* [EXTERNAL] Raise exception instead of silent change for new
> DateTimeformatter
>
>
>
> *History*
>
> *Hive 1.2* -
>
> VM time zone set to Asia/Bangkok
>
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
> UTC','-MM-dd HH:mm:ss z'));
>
> *Result* - 1800-01-01 07:00:00
>
> *Implementation details* -
>
> SimpleDateFormat formatter = new SimpleDateFormat(pattern);
> Long unixtime = formatter.parse(textval).getTime() / 1000;
> Date date = new Date(unixtime * 1000L);
>
> https://docs.oracle.com/javase/8/docs/api/java/util/Date.html
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oracle.com%2Fjavase%2F8%2Fdocs%2Fapi%2Fjava%2Futil%2FDate.html&data=04%7C01%7CSankar.Hariappan%40microsoft.com%7C013a8535c2af4647fb1308d9834ede18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637685197136779324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xxOBj5zDm29DTpPYC6rlgz639Dhn7vpHxALYHdn9VO0%3D&reserved=0>
>  .
> In official documentation they have mentioned that "Unfortunately, the API
> for these functions was not amenable to internationalization and The
> corresponding methods in Date are deprecated" . Due to that this is
> producing wrong result
>
> *latest hive* -
>
> set hive.local.time.zone=Asia/Bangkok;
>
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00
> UTC','-MM-dd HH:mm:ss z'));
>
> *Result* - 1800-01-01 06:42:04
>
> *Implementation details* -
>
> DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
> .parseCaseInsensitive()
> .appendPattern(pattern)
> .toFormatter();
>
> ZonedDateTime zonedDateTime =
> ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
> Long dttime = zonedDateTime.toInstant().getEpochSecond();
>
>
>
> *Problem*-
>
> Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which
> is not backward compatible. Causing issues at times for migration to the
> new version. Because the older data written using Hive 1.x or 2.x is not
> compatible with *DateTimeFormatter*.
>
>
>
> *Solution -*
>
> Introduce an config "hive.legacy.timeParserPolicy" with following values -
> *1. EXCEPTION* - compare value of
> both SimpleDateFormat & DateTimeFormatter raise exception if doesn't match
> *2. LEGACY *- use SimpleDateFormat
> *3. CORRECTED *- use DateTimeFormatter
>
> This will help hive user in the following manner -
> 1. Migrate to new version using *LEGACY*

Re: Commit message guidelines

2021-10-02 Thread Stamatis Zampetakis
Hi,

Part of these guidelines can be enforced by setting up webhooks [1], GitHub
actions [2], branch protection rules [3], but the most important enforcer
IMO is the reviewer/committer.
I am mostly interested in the quality of the message rather than if it has
a period in the end (.) or if it misses a space, although I have to say I
like things to be uniform :)
The first step would be for the community to express some interest about
having some guidelines (not necessarily these ones) and put them somewhere
so we can follow them; thanks to Alessandro and Peter who did that already.

The main usage of the commit summary from the majority of people is to find
and understand a specific change. The names there are redundant (since they
appear elsewhere) add clutter and some small editing overhead when
committing.
I also want to extract information from time to time from commits but this
is not something that is needed very often. Usually, I rely on custom
format (e.g., git log --format=format:"%h %aN %cN %s") or other means (SQL
via Calcite :)) to get what I want.

The 72 character line length (excluding JIRA id) is more important on the
summary and it is not supposed to be a hard limit. It is something that
could make people think a bit more when editing the message to find better
words.

About the length of the body, I don't really mind. I would be happy if
there was one in the first place. The majority of commits in the repo do
not have a body which is kind of problematic. Certainly it is not needed
when the change is trivial but I doubt it is in 90% of the cases.
There are people who write everything in the same line and rely on the
editors' wrapping and others who always wrap based on their preferences.
Preferences can vary so again this is an attempt to make things a bit more
uniform.

Best,
Stamatis

[1]
https://docs.github.com/en/developers/webhooks-and-events/webhooks/about-webhooks
[2] https://github.com/marketplace/actions/gs-commit-message-checker
[3]
https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/about-protected-branches#require-status-checks-before-merging


On Tue, Sep 28, 2021 at 6:29 PM Peter Vary 
wrote:

> Hi Stamatis,
>
> I am generally positive for every standardization process. My experience
> thought me that a standardized code is usually better in the long run than
> any super optimized solution where we specialize every codepath for the
> locally best optimum.
>
> First question:
> - Is it possible to check/enforce these guidelines somehow?
>
> If we have some tool to enforce this, that would be super helpful.
>
> I agree with most of your points.
>
> There are a few question marks, but I can live with all of the proposed
> requirements if that is the consensus:
> - I think having contributor / reviewer / PR number in the commit message
> is helpful (at least for me). I use the `git log --pretty=online|grep`
> quite often, and one usage is when I would like to collect the PR's which
> were authored or reviewed by a contributor
> - I am not sure about the 72 character line length - the PRs are most
> often viewed by tools which do the wrapping based on the available screen
> size. Wouldn't we want to relay on that?
>
> Thanks,
> Peter
>
> > On Sep 28, 2021, at 11:25, Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
> >
> > Hi Stamatis,
> > thanks for the suggestions, I think they are reasonable, the project
> would
> > benefit from their adoption.
> >
> > Regarding the removal of contributor/reviewers names from the commit
> > message favoring the use of git metadata, there has been a similar
> discussion
> > in Calcite ML
> > <
> https://mail-archives.apache.org/mod_mbox/calcite-dev/202109.mbox/%3ccafqnwdzys+gemeeevhreggjedpmjn5bc4bvecdusgrtcykv...@mail.gmail.com%3e
> >
> > which
> > led to consensus.
> >
> > Best regards,
> > Alessandro
> >
> > On Fri, 24 Sept 2021 at 16:40, Stamatis Zampetakis 
> > wrote:
> >
> >> Hi all,
> >>
> >> I think we all more or less follow some standard pattern when
> committing in
> >> Hive but with some small effort we could make things more uniform and
> >> hopefully better.
> >> I would like to start a discussion about creating some guidelines,
> which we
> >> could put to the wiki or in contributing.md, to improve the quality of
> our
> >> history (git log).
> >> I outline some suggestions below to kick off the discussion. Many
> things in
> >> the list are minor (and maybe even personal preferences) but one thing
> >> which is really missing from the project is B3 especially the *why*
> part

Re: checkstyle/checkstyle-noframes-sorted.xsl is licensed under LGPL

2021-10-29 Thread Stamatis Zampetakis
Hi Sebastian,

Thanks for bringing this up.

The ASF is rather explicit about category-x licenses [1] and LGPL is part
of them; they are not allowed neither in the source nor in the convenience
binary releases [2].
They can be present in the repo, used during the build, but must not be
part of the release.

Based on this, it seems that Hive is in violation of the ASF policy since
there are releases which include the aforementioned file.
I think the file should be excluded from the release artifacts or removed
completely from the repo.
Not sure what needs to be done (if anything) for the past releases.

Best,
Stamatis

[1] https://www.apache.org/legal/resolved.html#category-x
[2] https://www.apache.org/legal/resolved.html#prohibited

On Thu, Oct 28, 2021 at 2:54 AM Sebastián Mancilla 
wrote:

> The file checkstyle/checkstyle-noframes-sorted.xsl, which was added by
>
>
>
> https://github.com/apache/hive/commit/84a96667844810e9925598886201de444d9b60d4
> (https://issues.apache.org/jira/browse/HIVE-990)
>
> was taken from the Checkstyle repository, which is licensed under LGPL.
> It's currently in the "checkstyle/contribution" repository (still under
> LGPL):
>
>
>
> https://github.com/checkstyle/contribution/blob/master/xsl/checkstyle-noframes-sorted.xsl
>
> but it was added to the main repository by:
>
>
>
> https://github.com/checkstyle/checkstyle/commit/28086be0e207153372f4e499ac0f68afc28776f9
>
> which is a modification from the original file added by:
>
>
>
> https://github.com/checkstyle/checkstyle/commit/946f15a105c800ad1ae4d6d1fcf9453255ef4b4d
>
> Can a project using the Apache 2.0 license have a file that is under the
> LGPL? My understanding is that it is not allowed, only linking is possible.
>
> I ask because I took that file from the Gradle repo, which in turn took it
> from you, and need to figure out if it is legal to have it as part of my
> Apache 2.0 licensed project.
>
>
> --
> Sebastián Mancilla
>


Category-X JDBC drivers in Hive modules

2021-11-10 Thread Stamatis Zampetakis
Hi all,

Currently, we have some (MariadDB, MySQL, Oracle) Category-X [1] JDBC
drivers in some parts of the project. Sometimes they are included using the
dependency section with test and some others by relying on
download-maven-plugin [2].

Using test scope is kind of OK but it comes with the risk that we may write
code which needs JDBC driver classes in order to compile and this could be
seen as a violation of the AL2 when the Hive source code is released. From
my understanding, the use of download-maven-plugin, first introduced in
HIVE-23284 [3], was an attempt to remedy this problem. Now it comes back
since we started using the test scope again.

We have a few other drivers, namely Postgres, MSSQL, in test scope but are
less important since they have BSD-2 and MIT licenses which are not
problematic.

I would expect that in the context of Hive *all* the JDBC drivers should be
declared using the runtime. This would remove the need to
use the download-maven-plugin and would simplify the inclusion of drivers
in the build. We are not risking to create derivatives of GPL work since
the dependency is not present at compilation so we cannot really use the
respective classes in our code.

Moreover, driver dependencies could be marked optional, which is actually
true, and that would solve any potential licensing issues [4].

I would like to propose to use the following declaration for all JDBC
drivers no matter the license.


  org.mariadb.jdbc
  mariadb-java-client
   ${mariadb.version}
   runtime
   true


This will make things more uniform, solve any potential licensing issues,
and when in the future someone copy-pastes dependencies to include new
drivers there will be no violation of AL2.

What do you think?

Best,
Stamatis

[1] https://www.apache.org/legal/resolved.html#category-x
[2]
https://search.maven.org/artifact/com.googlecode.maven-download-plugin/download-maven-plugin/1.6.1/jar
[3] https://issues.apache.org/jira/browse/HIVE-23284
[4] https://www.apache.org/legal/resolved.html#optional


Re: Category-X JDBC drivers in Hive modules

2021-11-15 Thread Stamatis Zampetakis
Thanks for the feedback Zoltan!

Unlike Gradle, in Maven we cannot have a dependency declared at the same
time as test & runtime (unfortunately).
Nevertheless, I believe the combination runtime + optional is a better
option for the declaration of JDBC drivers.

I raised HIVE-25701 [1] and raised a PR [2] with the proposed changes.
Let's continue the discussion there.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25701
[2] https://github.com/apache/hive/pull/2790

On Fri, Nov 12, 2021 at 9:47 AM Zoltan Haindrich  wrote:

> Hey Stamatis!
>
> Makes sense to me; I think we already have all of the jdbc drivers in the
> test scope - but adding runtime is a great idea!
>
> I had some memories about some letter that we are using Cat-X stuff in
> Hive and we should remove it - I think HIVE-23284 was opened in response to
> that.
> However...if that comes back after these changes we may ask to update the
> scanner because we only use it in test runtime.
>
> cheers,
> Zoltan
>
> On 11/10/21 11:59 AM, Stamatis Zampetakis wrote:
> > Hi all,
> >
> > Currently, we have some (MariadDB, MySQL, Oracle) Category-X [1] JDBC
> > drivers in some parts of the project. Sometimes they are included using
> the
> > dependency section with test and some others by relying on
> > download-maven-plugin [2].
> >
> > Using test scope is kind of OK but it comes with the risk that we may
> write
> > code which needs JDBC driver classes in order to compile and this could
> be
> > seen as a violation of the AL2 when the Hive source code is released.
> From
> > my understanding, the use of download-maven-plugin, first introduced in
> > HIVE-23284 [3], was an attempt to remedy this problem. Now it comes back
> > since we started using the test scope again.
> >
> > We have a few other drivers, namely Postgres, MSSQL, in test scope but
> are
> > less important since they have BSD-2 and MIT licenses which are not
> > problematic.
> >
> > I would expect that in the context of Hive *all* the JDBC drivers should
> be
> > declared using the runtime. This would remove the need to
> > use the download-maven-plugin and would simplify the inclusion of drivers
> > in the build. We are not risking to create derivatives of GPL work since
> > the dependency is not present at compilation so we cannot really use the
> > respective classes in our code.
> >
> > Moreover, driver dependencies could be marked optional, which is actually
> > true, and that would solve any potential licensing issues [4].
> >
> > I would like to propose to use the following declaration for all JDBC
> > drivers no matter the license.
> >
> > 
> >org.mariadb.jdbc
> >mariadb-java-client
> > ${mariadb.version}
> > runtime
> > true
> > 
> >
> > This will make things more uniform, solve any potential licensing issues,
> > and when in the future someone copy-pastes dependencies to include new
> > drivers there will be no violation of AL2.
> >
> > What do you think?
> >
> > Best,
> > Stamatis
> >
> > [1] https://www.apache.org/legal/resolved.html#category-x
> > [2]
> >
> https://search.maven.org/artifact/com.googlecode.maven-download-plugin/download-maven-plugin/1.6.1/jar
> > [3] https://issues.apache.org/jira/browse/HIVE-23284
> > [4] https://www.apache.org/legal/resolved.html#optional
> >
>


Re: hive-exec vs. hive-exec:core

2021-11-18 Thread Stamatis Zampetakis
ng
> a
> >>>>> different version of the guava library?
> >>>>>
> >>>>> The changes which will remove the core artifact stuff is ready:
> >>>>> https://github.com/apache/hive/pull/2648
> >>>>>
> >>>>> cheers,
> >>>>> Zoltan
> >>>>>
> >>>>> On 9/21/21 8:23 PM, Edward Capriolo wrote:
> >>>>>> recommendation from the Hive team is to use the hive-exec.jar
> >> artifact.
> >>>>>>
> >>>>>> You know about 10 years ago. I mentioned that oozie should just use
> >>>>>> hive-service or hive jdbc. After a big fight where folks kept
> >> bringing up
> >>>>>> concurrency bugs in hive-server-1 my prs were rejected (even though
> >> hive
> >>>>>> server2 would not have these bugs). I still cannot fathom why
> someone
> >>>>> using
> >>>>>> oozie would want a fat jar of hive (as opposed to hive server or
> >>>>> hivejdbc)
> >>>>>> . If I had to do that, i would just use shell action. You all
> must
> >>>>> like
> >>>>>> enjoy shading jars.
> >>>>>>
> >>>>>> Edward
> >>>>>>
> >>>>>> On Thu, Sep 16, 2021 at 2:30 PM Chao Sun
> wrote:
> >>>>>>
> >>>>>>> I'm not sure whether it is a good idea to remove `hive-exec-core`
> >>>>>>> completely - it is still being used today by some other popular
> >> projects
> >>>>>>> including Spark and Trino/Presto. By sticking to `hive-exec-core`
> it
> >>>>> gives
> >>>>>>> more flexibility to the other projects to shade & relocate those
> >> classes
> >>>>>>> according to their need, without waiting for new Hive releases.
> Hive
> >>>>> also
> >>>>>>> needs to make sure it relocate everything properly. Otherwise, if
> >> some
> >>>>>>> classes are shaded & included in `hive-exec` but not relocated,
> there
> >>>>> is no
> >>>>>>> way for the other projects to exclude them and avoid potential
> >>>>> conflicts.
> >>>>>>> Chao
> >>>>>>>
> >>>>>>> On Thu, Sep 16, 2021 at 8:03 AM Zoltan Haindrich
> >> wrote:
> >>>>>>>
> >>>>>>>> Hey
> >>>>>>>>
> >>>>>>>> On 9/6/21 12:48 PM, Stamatis Zampetakis wrote:
> >>>>>>>>> Indeed this may lead to binary incompatibility problems as the
> one
> >> you
> >>>>>>>>> mentioned. If I understood correctly the problem you cite comes
> up
> >> if
> >>>>>>>>> library B in this case is not relocated. If Hive systematically
> >>>>>>> relocates
> >>>>>>>>> shaded deps do you think there will still be binary
> incompatibility
> >>>>>>>> issues?
> >>>>>>>>> If the relocating solution works, I would personally prefer going
> >> down
> >>>>>>>> this
> >>>>>>>>> path instead of introducing an entirely new module just for the
> >> sake
> >>>>> of
> >>>>>>>>> dependency management. Most of the time when there are problems
> >> with
> >>>>>>>>> shading the answer comes from relocating the problematic
> >> dependencies
> >>>>>>> and
> >>>>>>>>> people are more or less accustomed with this route.
> >>>>>>>> I totally agree with you Stamatis - with the addition that we
> should
> >>>>> work
> >>>>>>>> together with the owners of other projects to help them use the
> >> correct
> >>>>>>>> artifact to gain access to
> >>>>>>>> Hive's internal parts.
> >>>>>>>> I've opened HIVE-25531 to remove the core classified artifact -
> and
> >>>>>>> ensure
> >>>>>>>> that we will be uncovering and fixing future issues with the
> >> hive-exec
> >>>>>>>> artifac

Re: [ANNOUNCE] New committer: Zhihua Deng

2022-01-22 Thread Stamatis Zampetakis
Congrats Zhihua! Excellent list of contributions in many areas, well
deserved.

Best,
Stamatis

On Thu, Jan 20, 2022 at 1:38 PM Peter Vary 
wrote:

> Congratulations!
>
> > On 2022. Jan 20., at 8:58, Zoltan Haindrich  wrote:
> >
> > Hey all,
> >
> > Apache Hive's Project Management Committee (PMC) has invited Zhihua Deng
> > to become a committer, and we are pleased to announce that he has
> accepted!
> >
> > Zhihua welcome, thank you for your contributions, and we look forward
> your
> > further interactions with the community!
> >
> > Zoltan Haindrich (on behalf of the Apache Hive PMC)
>
>


Re: Time to Remove Hive-on-Spark

2022-01-28 Thread Stamatis Zampetakis
Hi team,

Almost one year has passed since the last exchange in this discussion and
if I am not wrong there has been no effort to revive Hive-on-Spark. To be
more precise, I don't think I have seen any Spark related JIRA for quite
some time now and although I don't want to rush into conclusions, there
does not seem to be any community member involved in maintaining or adding
new features in this part of the code.

Keeping dead code in the repository does not do any good to the project and
puts a non-negligible burden to future maintainers.

Clearly, we cannot make a new Hive release where a major feature is
completely untested so either someone commits to re-enable/fix the
respective tests soon or we move forward the work started by David and drop
support for Hive-on-Spark.

I would like to ask the community if there is anyone who can take up this
maintenance task and enable/fix Spark related tests in the next month or so?

Best,
Stamatis

On Sat, Feb 27, 2021 at 4:17 AM Edward Capriolo 
wrote:

> I do not know how it works for most of the world. But in cloudera where the
> TEZ options were never popular hive-on-spark represents a solid way to get
> things done for small datasets lower latency.
>
> As for the spark adoption. You know a while ago I came up with some ways to
> make hive more  spark like. One of them was a found a way to make "compile"
> a hive keyword so folks could build UDFs on the fly. It was such an
> uphil climb. Folks found a way to make it disabled by default for security.
> Then later when things moved from CLI to beeline it was like the ONLY thing
> that I found not ported. Like it was extremely frustrating.
>
>
>
>
>
>
> On Mon, Jul 27, 2020 at 3:19 PM David  wrote:
>
> > Hello  Xuefu,
> >
> > I am not part of the Cloudera Hive product team,  though I volunteer to
> > work on small projects from time to time.  Perhaps someone from that team
> > can chime in with some of their thoughts, but personally, I think that in
> > the long run, there will be more of a merge between Hive-on-Spark and
> other
> > Spark-native offerings.  I'm not sure what the differentiation will be
> > going forward.  With that said, are there any developers on this mailing
> > list who are willing to take on the maintenance effort of keeping HoS
> > moving forward?
> >
> > http://www.russellspitzer.com/2017/05/19/Spark-Sql-Thriftserver/
> >
> >
> https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/config-sts.html
> >
> >
> > Thanks.
> >
> > On Thu, Jul 23, 2020 at 12:35 PM Xuefu Zhang  wrote:
> >
> > > Previous reasoning seemed to suggest a lack of user adoption. Now we
> are
> > > concerned about ongoing maintenance effort. Both are valid
> > considerations.
> > > However, I think we should have ways to find out the answers.
> Therefore,
> > I
> > > suggest the following be carried out:
> > >
> > > 1. Send out the proposal (removing Hive on Spark) to users including
> > > u...@hive.apache.org and get their feedback.
> > > 2. Ask if any developers on this mailing list are willing to take on
> the
> > > maintenance effort.
> > >
> > > I'm concerned about user impact because I can still see issues being
> > > reported on HoS from time to time. I'm more concerned about the future
> of
> > > Hive if we narrow Hive neutrality on execution engines, which will
> > possibly
> > > force more Hive users to migrate to other alternatives such as Spark
> SQL,
> > > which is already eroding Hive's user base.
> > >
> > > Being open and neutral used to be Hive's most admired strengths.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > >
> > > On Wed, Jul 22, 2020 at 8:46 AM Alan Gates 
> wrote:
> > >
> > > > An important point here is I don't believe David is proposing to
> remove
> > > > Hive on Spark from the 2 or 3 lines, but only from trunk.  Continuing
> > to
> > > > support it in existing 2 and 3 lines makes sense, but since no one
> has
> > > > maintained it on trunk for some time and it does not work with many
> of
> > > the
> > > > newer features it should be removed from trunk.
> > > >
> > > > Alan.
> > > >
> > > > On Tue, Jul 21, 2020 at 4:10 PM Chao Sun  wrote:
> > > >
> > > > > Thanks David. FWIW Uber is still running Hive on Spark (2.3.4) on a
> > > very
> > > > > large scale in production right now and I don't think we have any
> > plan
> > > to
> > > > > change it soon.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 21, 2020 at 11:28 AM David  wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > Thanks for the feedback.
> > > > > >
> > > > > > Just a quick recap: I did propose this @dev and I received
> > unanimous
> > > > +1's
> > > > > > from the community.  After a couple months, I created the PR.
> > > > > >
> > > > > > Certainly open to discussion, but there hasn't been any
> discussion
> > > thus
> > > > > far
> > > > > > because there have been no objections until this point.
> > > > > >
> > > > > > HoS has low adoption, heavy technical debt, and the manner in
> 

[DISCUSS] Properties for scheduling compactions on specific queues

2022-01-31 Thread Stamatis Zampetakis
Hi all,

This email is an attempt to converge on which Hive/Tez/MR properties
someone should use in order to schedule a compaction on specific queues.
For those who are not familiar with how queues are used the YARN capacity
scheduler documentation [1] gives the general idea.

Using specific queues for compaction jobs is necessary to be able to
efficiently allocate resources for maintenance tasks (compaction) and
production workloads. Hive provides various ways to control the queues used
by the compactor and there have been various tickets with improvements and
fixes in this area (see list below).

The granularity we can select queues for compactions (all tables vs. per
table) currently depends on which compactor is in use (MR vs Query based)
and boils down to the following properties:

Global configuration:
* hive.compactor.job.queue
* mapred.job.queue.name
* tez.queue.name

Per table/statement configuration (table properties):
* compactor.mapred.job.queue.name (before HIVE-20723)
* compactor.hive.compactor.job.queue (after HIVE-20723)

Things are a bit blurred with respect to what properties someone should use
to achieve the desired result. Some changes, such as HIVE-20723, raise
backward compatibility concerns and other changes seem to have a larger
impact than the one specifically designed for. For example, after
HIVE-25595, map reduce queue properties can have an impact on the compactor
queues even when Tez is in use.

In order to avoid confusion and ensure long term support of these queue
selection features we should clarify which of the above properties should
be used.

Given the current situation, I would propose to officially support only the
following:
* hive.compactor.job.queue
* compactor.hive.compactor.job.queue
and align the implementation based on these (if necessary). In other words,
Hive users should not use mapred.job.queue.name and tez.queue.name
explicitly at least when it comes to the compactor. Hive should set them
transparently (as it happens now in various places) based on
[compactor.]hive.compactor.job.queue.

What do people think? Are there other ideas?

Best,
Stamatis

[1]
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

HIVE-11997: Add ability to send Compaction Jobs to specific queue
HIVE-13354: Add ability to specify Compaction options per table and per
request
HIVE-20723: Allow per table specification of compaction yarn queue
HIVE-24781: Allow to use custom queue for query based compaction
HIVE-25801: Custom queue settings is not honoured by Query based compaction
StatsUpdater
HIVE-25595: Custom queue settings is not honoured by compaction StatsUpdater


[DISCUSS] Compactor (Query vs MR) roadmap

2022-01-31 Thread Stamatis Zampetakis
Hi all,

In the current master, there are two approaches for performing compactions
of ACID tables [1]:
* using hard-coded MapReduce jobs (aka. CompactorMR [2]);
* using HiveQL queries (aka. QueryCompactor [3]) and delegating the
execution to the underlying engine (MR, Tez, other);

The motivation for introducing the query compactor was to make compaction
tasks engine independent, and potentially more efficient. In principle the
query based compaction should be able to completely replace the respective
MR jobs but it appears that it is not there yet.

At the moment of writing this email the two compactor modes are
complementary to each other. Compactions on insert-only tables (aka.
micromanaged tables) can only be done in the using the query compactor.
Moreover, query-based compactions on ACID tables work only when the
underlying engine is Tez (various bugs [4] seem to be blocking the use of
MR as an execution engine). The latter means that if someone is using MR as
the execution engine they cannot use the query based compactor. Certain
features (e.g., per-table selection of compaction queues [5]) exist for one
mode (and apparently are important for end users) but are not yet
implemented for the other.

Currently the query based compactor is not part of any Apache Hive release
but would be nice if someone could shed some light to the roadmap around
this feature. I tried to summarize very briefly the state of this work
based on my understanding but I am sure people who have worked on these
areas of the code can provide much better insights. Some quick questions
that come to mind are the following:
Is there going to be support for MR based compactor in the next releases of
Hive?
Is the query based compactor gonna work with an engine other than Tez? Is
someone working on this?
Are there benefits in using the MR based compactor when the query based
compactor is available?
Are there major features that are not yet part of the query based compactor
(and they need to be)?

Finally, I don't see any documentation around the "new" query based
compaction mode in the wiki [6]. I think it would be good if someone can
update the respective part of the documentation before releasing the next
Hive version.

Best,
Stamatis

[1] HIVE-5317: Implement insert, update, and delete in Hive with full ACID
support
[2] HIVE-6319: Add compactor for ACID tables (Apr, 2014)
[3] HIVE-20699: Query based compactor for full CRUD Acid tables (Feb, 2019)
[4] HIVE-24015: Disable query-based compaction on MR execution engine
(Karen Coppage, reviewed by Laszlo Pinter)
[5] HIVE-20723: Allow per table specification of compaction yarn queue
[6]
https://cwiki.apache.org/confluence/display/hive/hive+transactions#HiveTransactions-Compactor


Re: Start releasing the master branch

2022-02-02 Thread Stamatis Zampetakis
Hello,

Thanks for starting the discussion Zoltan.

I strongly believe that it is important to have regular and often releases
otherwise people will create and maintain separate Hive forks.
The latter is not good for the project and the community may lose valuable
members because of it.

Going forward I fully agree that there is no point bringing up strong
blockers for the next release. For sure there are many backward
incompatible changes and possibly unstable features but unless we get a
release out it will be difficult to determine what is broken and what needs
to be fixed.

Due to the big number of changes that are going to appear in the next
version I would suggest using the terms Hive X-alpha, Hive X-beta for the
first few releases. This will make it clear to the end users that they need
to be careful when upgrading from an older version and it will give us a
bit more time and freedom to treat issues that the users will likely
discover.

The only real blocker that we may want to treat is HIVE-25665 [1] but we
can continue the discussion under that ticket and re-evaluate if necessary,

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-25665


On Tue, Feb 1, 2022 at 5:03 PM Zoltan Haindrich  wrote:

> Hey All,
>
> We didn't made a release for a long time now; (3.1.2 was released on 26
> August 2019) - and I think because we didn't made that many branch-3
> releases; not too many fixes
> were ported there - which made that release branch kinda erode away.
>
> We have a lot of new features/changes in the current master.
> I think instead of aiming for big feature-packed releases we should aim
> for making a regular release every few months - we should make regular
> releases which people could
> install and use.
> After all releasing Hive after more than 2 years would be big step forward
> in itself alone - we have so many improvements that I can't even count...
>
> But I may know not every aspects of the project / states of some internal
> features - so I would like to ask you:
> What would be the bare minimum requirements before we could release the
> current master as Hive X?
>
> There are many nice-to-have-s like:
> * hadoop upgrade
> * jdk11
> * remove HoS or MR
> * ?
> but I don't think these are blockers...we can make any of these in the
> next release if we start making them...
>
> cheers,
> Zoltan
>


Re: [DISCUSS] Compactor (Query vs MR) roadmap

2022-02-03 Thread Stamatis Zampetakis
Hi Karen,

Many thanks for joining the discussion.

The fact that there are two components with quite a bit of overlap in their
behavior is not something that can be easily maintained in the long term.
Additionally, I have the impression that some commercial offerings of Hive
are using the QB compactor by default. This along with all the other things
you mentioned (MR deprecation, missing support for MM tables, etc.) gives
me the impression that MR compactor is reaching EOL.

I haven't worked enough on this part of the code to have a strong opinion
on the way to move forward but if we are moving into deprecating the MR
compactor, which I find reasonable, it would be good to make this explicit
both for end users and Hive developers. I leave this decision to you and
the other people in the community who have contributed many fixes &
improvements to the compactor.

Best,
Stamatis


On Wed, Feb 2, 2022 at 11:58 AM Karen Coppage 
wrote:

> Hi Stamatis,
>
> Thanks for your questions. You bring up good points.
>
> A bit about the state of the two compaction implementations:
> MR compaction (uses class CompactorMR) is older and more stable. I have
> only seen a couple bugs in the past few years.
> QB (query-based) compaction is required when YARN is unavailable. And, as
> you mentioned, insert-only/MM tables use QB compaction (MM compaction has
> its own own semi-separate implementation from QB compaction of full ACID
> tables). If we ever extend ACID to support a file format outside of ORC, I
> can see QB compaction as the easier way forward. And lastly, QB compaction
> has never been officially released, since it belongs in version 4.0.0… so
> if we really want to get rid of MR compaction, it would probably be best to
> deprecate it first and leave it available as a backup option for a while.
> One big question remains unanswered, which is: which implementation is
> more efficient? If MR compaction is, then we should keep it and it should
> be used when possible. Otherwise it can be deprecated. I don’t think
> anybody’s working on the testing that would be necessary to answer this
> question, mostly because there are many small fires to put out around other
> parts of compaction.
> Another thing – since QB compaction runs insert queries, it involves a few
> extra move steps, which is slow with object storage. The “direct insert”
> feature will mitigate slowness but (a) it still has quite a few rough edges
> and (b) I don’t think it’s enabled for compaction queries at all.
>
> Questions I didn’t answer above:
>
> > Is the query based compactor gonna work with an engine other than Tez?
> Is someone working on this?
>
> The MR execution engine has been deprecated since Hive 2 (2015).
> Worst-case scenario, users can just run MR compaction. But I hope that this
> is not the case!
>
> > Are there major features that are not yet part of the query based
> compactor (and they need to be)?
>
> I’m pretty sure QB compaction does not yet honor the “WITH OVERWRITE
> TBLPROPERTIES” clause in an ALTER TABLE… COMPACT… statement. This could be
> something to add in the future.
>
> I agree that documenting QB compaction is a must, thanks for pointing this
> out!
>
> Cheers,
> Karen
>
>
> > On 2022. Jan 31., at 23:02, Stamatis Zampetakis 
> wrote:
> >
> > Hi all,
> >
> > In the current master, there are two approaches for performing
> compactions
> > of ACID tables [1]:
> > * using hard-coded MapReduce jobs (aka. CompactorMR [2]);
> > * using HiveQL queries (aka. QueryCompactor [3]) and delegating the
> > execution to the underlying engine (MR, Tez, other);
> >
> > The motivation for introducing the query compactor was to make compaction
> > tasks engine independent, and potentially more efficient. In principle
> the
> > query based compaction should be able to completely replace the
> respective
> > MR jobs but it appears that it is not there yet.
> >
> > At the moment of writing this email the two compactor modes are
> > complementary to each other. Compactions on insert-only tables (aka.
> > micromanaged tables) can only be done in the using the query compactor.
> > Moreover, query-based compactions on ACID tables work only when the
> > underlying engine is Tez (various bugs [4] seem to be blocking the use of
> > MR as an execution engine). The latter means that if someone is using MR
> as
> > the execution engine they cannot use the query based compactor. Certain
> > features (e.g., per-table selection of compaction queues [5]) exist for
> one
> > mode (and apparently are important for end users) but are not yet
> > implemented for the other.
> >
> > Currently the query based compactor is 

Re: [ANNOUNCE] New committer: Ayush Saxena

2022-02-07 Thread Stamatis Zampetakis
Welcome Ayush!

Excellent contributions in many different components of Hive, very well
deserved!

Best,
Stamatis

On Mon, Feb 7, 2022 at 6:32 PM aasha medhi 
wrote:

> Congratulations Ayush. Well deserved !!
>
> On Mon, Feb 7, 2022 at 10:36 PM Laszlo Pinter  >
> wrote:
>
> > Congrats Ayush!
> >
> > On Mon, Feb 7, 2022, 5:54 PM Pravin Sinha 
> wrote:
> >
> > > Congrats, Ayush !!
> > > Well deserved. Keep up the good work.
> > >
> > > ~Pravin
> > >
> > > On Mon, Feb 7, 2022 at 10:11 PM Battula, Brahma Reddy
> > >  wrote:
> > >
> > > > Congratulations Ayush Saxena!! Well Deserved!.
> > > >
> > > > From: László Bodor 
> > > > Date: Monday, 7 February 2022 at 9:20 PM
> > > > To: dev@hive.apache.org 
> > > > Subject: Re: [ANNOUNCE] New committer: Ayush Saxena
> > > > Welcome Ayush, well deserved!
> > > >
> > > > Ashutosh Chauhan  ezt írta (időpont: 2022.
> febr.
> > > 7.,
> > > > H, 16:35):
> > > >
> > > > > Hi all,
> > > > > Apache Hive's Project Management Committee (PMC) has invited Ayush
> > > > > to become a committer, and we are pleased to announce that he has
> > > > accepted!
> > > > >
> > > > > Ayush welcome, thank you for your contributions, and we look
> forward
> > to
> > > > > your
> > > > > further interactions with the community!
> > > > > Ashutosh (on behalf of Hive PMC)
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Denys Kuzmenko joins Hive PMC

2022-02-07 Thread Stamatis Zampetakis
It's great to see another ACID guru joining the PMC :)

Congrats Denys!

Best,
Stamatis

On Mon, Feb 7, 2022 at 6:56 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Congratulations Denys! :)
>
> On Mon, 7 Feb 2022 at 18:37, Pravin Sinha  wrote:
>
> > Congrats, Denys !
> >
> > On Mon, Feb 7, 2022 at 11:02 PM aasha medhi 
> > wrote:
> >
> > > Congratulations Denys !
> > >
> > > On Mon, Feb 7, 2022 at 10:36 PM Laszlo Pinter
> >  > > >
> > > wrote:
> > >
> > > > Congrats Denys!
> > > >
> > > > On Mon, Feb 7, 2022, 6:00 PM László Bodor  >
> > > > wrote:
> > > >
> > > > > Congrats Denys!!
> > > > >
> > > > > Naresh P R  ezt írta (időpont: 2022.
> > febr.
> > > > 7.,
> > > > > H, 17:43):
> > > > >
> > > > > > Congrats Denys, well deserved !!!
> > > > > > ---
> > > > > > Regards,
> > > > > > Naresh P R
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 8:40 AM Ashutosh Chauhan <
> > > hashut...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm pleased to announce that Denys has accepted an invitation
> to
> > > > > > > join the Hive PMC. Denys has been a consistent and helpful
> > > > > > > figure in the Hive community for which we are very grateful. We
> > > > > > > look forward to the continued contributions and support.
> > > > > > >
> > > > > > > Please join me in congratulating Denys!
> > > > > > >
> > > > > > > Ashutosh (On behalf of Hive PMC)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Properties for scheduling compactions on specific queues

2022-02-07 Thread Stamatis Zampetakis
Thanks Janos for the feedback.

If I understand well your suggestion is support all of the properties below
for table level compactions and treat them as equivalent:
* compactor.mapred.job.queue.name
* compactor.mapreduce.job.queuename
* compactor.hive.compactor.job.queue

It is something that crossed my mind as well but I am slightly skeptical
because like this we explicitly state that people are free to use whatever
they like. It might also have as a consequence MR properties affecting Tez
(as it happens a bit with HIVE-25595) which from my perspective is not that
great. I am also thinking that it will lead to more requests for accepting
these MR specific properties in the query based compactor which cannot (and
probably will never) use MR as the underlying engine. We should also keep
in mind that the MR engine was deprecated ~6years ago and the MR compactor
may follow soon.

I am fine implementing this specific change (accepting all properties
above) as long as someone from the people contributing to the compactor
confirms it is the desired path going forward.

Best,
Stamatis

On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs  wrote:

> Hi Stamatis,
>
> I agree that the [compactor.]*hive.compactor.queue.name
> <http://hive.compactor.queue.name>* is a better solution as hive now also
> supports query based compaction, not only MR.
> ...although I think this needs to be backward compatible!
>
> What do you think about a logic similar to this:
>
> --- a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> 2022-02-07 10:31:28.0 +0100
> +++ b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> 2022-02-07 10:33:25.0 +0100
> @@ -145,10 +145,19 @@
>  overrideMRProps(job, t.getParameters()); // override MR properties from 
> tblproperties if applicable
>  if (ci.properties != null) {
>overrideTblProps(job, t.getParameters(), ci.properties);
>  }
>
> +// make queue configuration backward compatible
> +// at that point overrideMRProps and OverrideTblProps already 
> consolidated
> +// the final value, just need to use job.TBALE_PROPS
> +String queueNameLegacy =
> +  (new 
> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("compactor.mapred.job.queue.name");
> +if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
> +  job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
> +}
> +
>  String queueName = HiveConf.getVar(job, ConfVars.COMPACTOR_JOB_QUEUE);
>  if (queueName != null && queueName.length() > 0) {
>job.setQueueName(queueName);
>  }
>
>
> Of course this can be wrapped around with a new config if needed, like
> hive.compaction.queue.name.use.legacy or whatever...
> FYI: we might also want to check legacy config not only for 
> *"compactor.mapred.job.queue.name
> <http://compactor.mapred.job.queue.name>"* but also for
> *"compactor.mapreduce.job.queuename" *as the first one was already on the
> deprecated list as pointed out by Peter Vary.
>
> Please also note that the change introduced by HIVE-25595 is currently not
> compatible with the new config as it was developed for the old
> compactor.mapred... property:
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorUtil.java#L31
> This also needs to be handled - for both the new prop name and backward
> compatibility.
>
> R, Janos
>
>
> On 2022/01/31 09:50:49 Stamatis Zampetakis wrote:
> > Hi all,
> >
> > This email is an attempt to converge on which Hive/Tez/MR properties
> > someone should use in order to schedule a compaction on specific queues.
> > For those who are not familiar with how queues are used the YARN capacity
> > scheduler documentation [1] gives the general idea.
> >
> > Using specific queues for compaction jobs is necessary to be able to
> > efficiently allocate resources for maintenance tasks (compaction) and
> > production workloads. Hive provides various ways to control the queues
> used
> > by the compactor and there have been various tickets with improvements
> and
> > fixes in this area (see list below).
> >
> > The granularity we can select queues for compactions (all tables vs. per
> > table) currently depends on which compactor is in use (MR vs Query based)
> > and boils down to the following properties:
> >
> > Global configuration:
> > * hive.compactor.job.queue
> > * mapred.job.queue.name
> > * tez.queue.name
> >
> > Per table/statement configuration (table properties):
> > * compactor.mapred.job.queue.name (before HIVE-20723)
> > * co

Re: [DISCUSS] Properties for scheduling compactions on specific queues

2022-02-10 Thread Stamatis Zampetakis
Hi all,

@Janos: The patch didn't go through but I get the idea. FYI: in most apache
lists attachments are not allowed

Since people do not have strong feelings about this I am inclined to move
forward with Janos suggestion and accept all three properties for
specifying compactions.

I just logged HIVE-25947 about this.

Best,
Stamatis

On Tue, Feb 8, 2022 at 2:45 PM Janos Kovacs  wrote:

> Hi Stamatis,
>
> The attached one is a more generic proposal: basically moves out target
> queue resolution to CompactorUtil.getCompactorJobQueueName as it is
> already in use for the StatsUpdater.
> There then the old properties are used based on a new fallback config
> prop. Fallback is only for statement (ci.props) and table (t.props) and not
> global configuration.
>
> It's just a mock-up, I didn't even check if it compiles, but shows the
> logic which should be good enough for the discussion.
>
> R, Janos
>
>
> Stamatis Zampetakis  ezt írta (időpont: 2022. febr.
> 7., H, 23:44):
>
>> Thanks Janos for the feedback.
>>
>> If I understand well your suggestion is support all of the properties
>> below
>> for table level compactions and treat them as equivalent:
>> * compactor.mapred.job.queue.name
>> * compactor.mapreduce.job.queuename
>> * compactor.hive.compactor.job.queue
>>
>> It is something that crossed my mind as well but I am slightly skeptical
>> because like this we explicitly state that people are free to use whatever
>> they like. It might also have as a consequence MR properties affecting Tez
>> (as it happens a bit with HIVE-25595) which from my perspective is not
>> that
>> great. I am also thinking that it will lead to more requests for accepting
>> these MR specific properties in the query based compactor which cannot
>> (and
>> probably will never) use MR as the underlying engine. We should also keep
>> in mind that the MR engine was deprecated ~6years ago and the MR compactor
>> may follow soon.
>>
>> I am fine implementing this specific change (accepting all properties
>> above) as long as someone from the people contributing to the compactor
>> confirms it is the desired path going forward.
>>
>> Best,
>> Stamatis
>>
>> On Mon, Feb 7, 2022 at 11:50 AM Janos Kovacs  wrote:
>>
>> > Hi Stamatis,
>> >
>> > I agree that the [compactor.]*hive.compactor.queue.name
>> > <http://hive.compactor.queue.name>* is a better solution as hive now
>> also
>> > supports query based compaction, not only MR.
>> > ...although I think this needs to be backward compatible!
>> >
>> > What do you think about a logic similar to this:
>> >
>> > ---
>> a/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
>> 2022-02-07 10:31:28.0 +0100
>> > +++
>> b/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
>> 2022-02-07 10:33:25.0 +0100
>> > @@ -145,10 +145,19 @@
>> >  overrideMRProps(job, t.getParameters()); // override MR properties
>> from tblproperties if applicable
>> >  if (ci.properties != null) {
>> >overrideTblProps(job, t.getParameters(), ci.properties);
>> >  }
>> >
>> > +// make queue configuration backward compatible
>> > +// at that point overrideMRProps and OverrideTblProps already
>> consolidated
>> > +// the final value, just need to use job.TBALE_PROPS
>> > +String queueNameLegacy =
>> > +  (new
>> StringableMap(job.get(TABLE_PROPS))).toProperties().getProperty("
>> compactor.mapred.job.queue.name");
>> > +if (queueNameLegacy != null && queueNameLegacy.length() > 0) {
>> > +  job.set(ConfVars.COMPACTOR_JOB_QUEUE, queueNameLegacy);
>> > +}
>> > +
>> >  String queueName = HiveConf.getVar(job,
>> ConfVars.COMPACTOR_JOB_QUEUE);
>> >  if (queueName != null && queueName.length() > 0) {
>> >job.setQueueName(queueName);
>> >  }
>> >
>> >
>> > Of course this can be wrapped around with a new config if needed, like
>> > hive.compaction.queue.name.use.legacy or whatever...
>> > FYI: we might also want to check legacy config not only for *"
>> compactor.mapred.job.queue.name
>> > <http://compactor.mapred.job.queue.name>"* but also for
>> > *"compactor.mapreduce.job.queuename" *as the first one was already on
>> the
>> > deprecated list as pointed out by Peter Vary.
>> >
>> > 

Re: Start releasing the master branch

2022-03-01 Thread Stamatis Zampetakis
>
> > >>> On Wed, Feb 9, 2022 at 4:55 AM Zoltan Haindrich  wrote:
> > >>>
> > >>>> Hey,
> > >>>>
> > >>>> Thank you guys for chiming in; versioning is for sure something we
> > should
> > >>>> get to some common ground.
> > >>>> Its a triple problem right now; I think we have the following
> things:
> > >>>> * storage-api
> > >>>> ** we have "2.7.3-SNAPSHOT" in the repo
> > >>>> ***
> > >>>>
> >
> https://github.com/apache/hive/blob/0d1cc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
> > >>>> ** meanwhile we already have 2.8.1 released to maven central
> > >>>> ***
> > https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
> > >>>> * standalone-metastore
> > >>>> ** 4.0.0-SNAPSHOT in the repo
> > >>>> ** last release is 3.1.2
> > >>>> * hive
> > >>>> ** 4.0.0-SNAPSHOT in the repo
> > >>>> ** last release is 3.1.2
> > >>>>
> > >>>> Regarding the actual version number I'm not entirely sure where we
> > should
> > >>>> start the numbering - that's why I was referring to it as Hive-X in
> my
> > >>>> first letter.
> > >>>>
> > >>>> I think the key point here would be to start shipping releases
> > regularily
> > >>>> and not the actual version number we will use - I'll kinda open to
> any
> > >>>> versioning scheme which
> > >>>> reflects that this is a newer release than 3.1.2.
> > >>>>
> > >>>> I could imagine the following ones:
> > >>>> (A) start with something less expected; but keep 3 in the prefix to
> > >>>> reflect that this is not yet 4.0
> > >>>>  I can imagine the following numbers:
> > >>>>  3.900.0, 3.901.0, ...
> > >>>>  3.9.0, 3.9.1, ...
> > >>>> (B) start 4.0.0
> > >>>>  4.0.0, 4.1.0, ...
> > >>>> (C) jump to some calendar based version number like 2022.2.9
> > >>>>  trunk based development has pros and cons...making a move like
> > this
> > >>>> irreversibly pledges trunk based development; and makes release
> > branches
> > >>>> hard to introduce
> > >>>> (X) somewhat orthogonal is to (also) use some suffixes
> > >>>>  4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
> > >>>>  this is probably the most tempting to use - but this versioning
> > >>>> schema with a non-changing MINOR and PATCH number will
> > >>>>  also suggest that the actual software is fully compatible - and
> > only
> > >>>> bugs are being fixed - which will not be true...
> > >>>>
> > >>>> I really like the idea to suffix these releases with alpha or beta -
> > >>>> which
> > >>>> will communicate our level commitment that these are not 100%
> > production
> > >>>> ready artifacts.
> > >>>>
> > >>>> I think we could fix HIVE-25665; and probably experiment with
> > >>>> 4.0.0-alpha1
> > >>>> for start...
> > >>>>
> > >>>>> This also means there should *not* be a branch-4 after releasing
> Hive
> > >>>> 4.0
> > >>>>> and let that diverge (and becomes the next, super-ignored
> branch-3),
> > >>>> correct; no need to keep a branch we don't maintain...but in any
> case
> > I
> > >>>> think we can postpone this decision until there will be something to
> > >>>> release... :)
> > >>>>
> > >>>> cheers,
> > >>>> Zoltan
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
> > >>>>> Hi All!
> > >>>>>
> > >>>>> A purely technical question: what will the SNAPSHOT version become
> > after
> > >>>>> releasing Hive 4.0.0? I think this is important, as it defines and
> > >>>> reflects
> > >>>>> the future release plans.
> > >>>>>
> > >>>>> Curren

Re: Start releasing the master branch

2022-03-02 Thread Stamatis Zampetakis
nk you guys for chiming in; versioning is for sure something we
> >>> should
> >>>>>>> get to some common ground.
> >>>>>>> Its a triple problem right now; I think we have the following
> things:
> >>>>>>> * storage-api
> >>>>>>> ** we have "2.7.3-SNAPSHOT" in the repo
> >>>>>>> ***
> >>>>>>>
> >>>
> https://github.com/apache/hive/blob/0d1cc7c5005fe47759298fb35a1c67edc93f/storage-api/pom.xml#L27
> >>>>>>> ** meanwhile we already have 2.8.1 released to maven central
> >>>>>>> ***
> >>> https://mvnrepository.com/artifact/org.apache.hive/hive-storage-api
> >>>>>>> * standalone-metastore
> >>>>>>> ** 4.0.0-SNAPSHOT in the repo
> >>>>>>> ** last release is 3.1.2
> >>>>>>> * hive
> >>>>>>> ** 4.0.0-SNAPSHOT in the repo
> >>>>>>> ** last release is 3.1.2
> >>>>>>>
> >>>>>>> Regarding the actual version number I'm not entirely sure where we
> >>> should
> >>>>>>> start the numbering - that's why I was referring to it as Hive-X
> in my
> >>>>>>> first letter.
> >>>>>>>
> >>>>>>> I think the key point here would be to start shipping releases
> >>> regularily
> >>>>>>> and not the actual version number we will use - I'll kinda open to
> any
> >>>>>>> versioning scheme which
> >>>>>>> reflects that this is a newer release than 3.1.2.
> >>>>>>>
> >>>>>>> I could imagine the following ones:
> >>>>>>> (A) start with something less expected; but keep 3 in the prefix to
> >>>>>>> reflect that this is not yet 4.0
> >>>>>>> I can imagine the following numbers:
> >>>>>>> 3.900.0, 3.901.0, ...
> >>>>>>> 3.9.0, 3.9.1, ...
> >>>>>>> (B) start 4.0.0
> >>>>>>> 4.0.0, 4.1.0, ...
> >>>>>>> (C) jump to some calendar based version number like 2022.2.9
> >>>>>>> trunk based development has pros and cons...making a move like
> >>> this
> >>>>>>> irreversibly pledges trunk based development; and makes release
> >>> branches
> >>>>>>> hard to introduce
> >>>>>>> (X) somewhat orthogonal is to (also) use some suffixes
> >>>>>>> 4.0.0-alpha1, 4.0.0-alpha2, 4.0.0-beta1
> >>>>>>> this is probably the most tempting to use - but this versioning
> >>>>>>> schema with a non-changing MINOR and PATCH number will
> >>>>>>> also suggest that the actual software is fully compatible - and
> >>> only
> >>>>>>> bugs are being fixed - which will not be true...
> >>>>>>>
> >>>>>>> I really like the idea to suffix these releases with alpha or beta
> -
> >>>>>>> which
> >>>>>>> will communicate our level commitment that these are not 100%
> >>> production
> >>>>>>> ready artifacts.
> >>>>>>>
> >>>>>>> I think we could fix HIVE-25665; and probably experiment with
> >>>>>>> 4.0.0-alpha1
> >>>>>>> for start...
> >>>>>>>
> >>>>>>>> This also means there should *not* be a branch-4 after releasing
> Hive
> >>>>>>> 4.0
> >>>>>>>> and let that diverge (and becomes the next, super-ignored
> branch-3),
> >>>>>>> correct; no need to keep a branch we don't maintain...but in any
> case
> >>> I
> >>>>>>> think we can postpone this decision until there will be something
> to
> >>>>>>> release... :)
> >>>>>>>
> >>>>>>> cheers,
> >>>>>>> Zoltan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 2/9/22 10:23 AM, L?szl? Bodor wrote:
> >>>>>>>> Hi All!
> >>>>>>>>
> >>>>>>&g

Re: Start releasing the master branch

2022-03-09 Thread Stamatis Zampetakis
I just logged HIVE-26022 [1] which seems to be another potential blocker
for 4.0.0-alpha-1.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26022

On Thu, Mar 3, 2022 at 3:54 PM Peter Vary  wrote:

> Hi Team,
>
> Here is our status:
> We collected the blocker tickets and marked them with fixVersion
> 4.0.0-alpha-1:
>
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1
> <https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1>
>
>- HIVE-26002 - Create db scripts for 4.0.0-alpha-1
>- HIVE-25994 - Analyze table runs into ClassNotFoundException-s in
>case binary distribution is used
>- HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs
>
> Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you
> happen to know of any other blockers.
>
> We plan to fix these jiras, and then release the following artifacts
> together:
>
>- Storage API - 4.0.0-alpha-1
>- Standalone Metastore - 4.0.0-alpha-1
>- Hive - 4.0.0-alpha-1
>
>
> Thanks,
> Peter
>
>
> On 2022. Mar 2., at 11:50, Peter Vary  wrote:
>
> Will continue this discussion on the #hive ASF slack. If you are
> interested, please join.
> We will do updates here time-to-time, so the ones who are not using slack
> can participate that way.
>
> On 2022. Mar 2., at 11:11, Peter Vary  wrote:
>
> Good idea Zoltan, joined the channel.
> I would like to scope reasonably small, so I agree with focusing on
> 4.0.0-alpha-1
>
> On 2022. Mar 2., at 11:01, Zoltan Haindrich  wrote:
>
> Hey,
>
> regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira:
> * I think we should change all already resolved tickets with fix version
> 4.0.0 to have fix version 4.0.0-alpha-1
> ** this could be postponed until we are actually releasing the thing as I
> think everyone committing to the master is entering 4.0.0 as fix version
> without much aftertought...this could probably change after we get the
> first release out.
> * regarding the the existing tickets with fix version/target version 4.0.0
> - I think that would be a bit too much (>200 tickets)
> ** some numbers:
> *** 239 tickets open now
> *** 224 was not updated in the last 90 days
> *** 216 was not changed in the last 180 days
> *** 178 was not updated in the last 360 days
> ** as a matter of fact I think many of these tickets shouldn't even have a
> target or fix version - and most of them should be unassigned...I don't
> want to get lost in this right now...I think for now we should keep the
> scope small and only care with 4.0.0-alpha-1 tickets
>
> https://issues.apache.org/jira/issues/?
> jql=project%20%3D%20hive%20and%20resolutiondate%20%20is%20empty%20and%20(fixVersion%20%20in%20(%274.0.0%27)%20or%20cf%5B12310320%5D%20%20in%20(%274.0.0%27))
>
> I think for faster communication regarding these things we could also
> utilize the #hive channel on the ASF slack - what do you guys think?
>
> cheers,
> Zoltan
>
> On 3/2/22 9:51 AM, Stamatis Zampetakis wrote:
>
> Agree with Peter, creating JIRAs is the way to go.
> Putting the appropriate priority (e.g., BLOCKER) and version (4.0.0 or
> 4.0.0-alpha-1) when creating the JIRA should be enough to keep us on track.
> I am mentioning both 4.0.0 and 4.0.0-alpha-1 because eventually I think we
> are gonna move everything with target 4.0.0 to 4.0.0-alpha-1.
> On Wed, Mar 2, 2022 at 9:37 AM Peter Vary 
> wrote:
>
> Hi Team,
>
> Could we create tickets for the issues?
> I think it would be good to collect the issues/potential blockers in the
> jira instead of having a complicated mail thread.
>
> If we set the target version to 4.0.0-alpha-1, then we can easily use the
> following filter to see the status of the tasks:
>
>
> https://issues.apache.org/jira/issues/?jql=project%3D%22HIVE%22%20AND%20%22Target%20Version%2Fs%22%3D%224.0.0-alpha-1%22
> <
>
> https://issues.apache.org/jira/issues/?jql=project=%22HIVE%22%20AND%20%22Target%20Version/s%22=%224.0.0-alpha-1%22
>
>
>
> @Stamatis: Sadly I have missed your letter/jira and created my own with
> the fix for building from the src package:
> https://issues.apache.org/jira/browse/HIVE-25997 <
> https://issues.apache.org/jira/browse/HIVE-25997>
> If you have time, I would like to ask you to review.
>
> If anyone knows of any blocker I would like to ask them to create a jira
> for that too.
>
> Thanks,
> Peter
>
>
> On 2022. Mar 2., at 7:04, Sungwoo Park  wrote:
>
> Hello Alessandro,
>
> For t

Re: Supported Hive versions

2022-03-09 Thread Stamatis Zampetakis
Hi Martijn,

Worth mentioning that the community is also actively working to get the
next major release out, namely 4.0.0-alpha-1, in the next few weeks.

After getting the first cut of 4.0.0 out we should discuss if we can
reasonably support older branches.
The history so far shows that all the dev effort is put on master and many
critical patches are never backported to other branches.
I think supporting multiple branches is not sustainable and has not worked
very well so far.

Best,
Stamatis

On Wed, Mar 9, 2022 at 6:43 PM Chao Sun  wrote:

> Correct. I'm not aware of any activity in these branches, but others
> may chime in if otherwise.
>
> On Wed, Mar 9, 2022 at 9:40 AM Martijn Visser 
> wrote:
> >
> > Hi Chao,
> >
> > Thanks for the info. Does that mean that the currently supported Hive
> > versions are 2.3.* and 3.1.* ? I'm guessing there's no more support for
> > Hive 1.*, 2.0.* and 2.1.* ?
> >
> > Best regards,
> >
> > Martijn
> >
> > On Wed, 9 Mar 2022 at 18:23, Chao Sun  wrote:
> >
> > > Hi Martijn,
> > >
> > > The download page should indeed show Hive 2.3.9. Let me check if I
> > > missed anything during the release.
> > >
> > > And yes Hive 2.3.x is still supported. We probably will start another
> > > release in the coming weeks. We have a Hive 3.1.x release going on and
> > > you can track https://issues.apache.org/jira/browse/HIVE-25855 for the
> > > progress.
> > >
> > > Chao
> > >
> > > On Wed, Mar 9, 2022 at 7:53 AM Martijn Visser 
> > > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > We currently support multiple versions for our Hive connectors in
> Apache
> > > > Flink. When looking at the Hive download page [1] I see multiple
> > > > announcements for new Hive versions, with the latest being 2.3.8.
> When
> > > > actually visiting the Apache mirror, I see multiple versions listed
> like
> > > >
> > > > Hive 1.2.2
> > > > Hive 2.3.9
> > > > Hive 3.1.2
> > > > Hive Metastore 2.7.3
> > > > etc
> > > >
> > > > Is it documented somewhere which versions of Hive are (still)
> supported
> > > by
> > > > the Hive community?
> > > >
> > > > Best regards,
> > > >
> > > > Martijn Visser
> > > > https://twitter.com/MartijnVisser82
> > > >
> > > > [1] https://hive.apache.org/downloads.html
> > >
>


Re: Hive 3 and Java 11 issue

2022-03-14 Thread Stamatis Zampetakis
I confirm what Pau said, the only supported JDK is 8. Upgrade to JDK 11 has
started [1] but was paused due to various problems. I don't know if anyone
is actively working on fixing this at the moment.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-22415

On Thu, Mar 10, 2022 at 10:34 AM Bitfox  wrote:

> That sounds bad. All our apps are running on JDK 11.
>
> On Thu, Mar 10, 2022 at 5:06 PM Pau Tallada  wrote:
>
>> I think only JDK8 is supported yet
>>
>> Missatge de Bitfox  del dia dj., 10 de març 2022 a
>> les 2:39:
>>
>>> my java version:
>>>
>>> openjdk version "11.0.13" 2021-10-19
>>>
>>>
>>> I can't run hive 3.1.2.
>>>
>>> The error include:
>>>
>>>
>>> Exception in thread "main" java.lang.ClassCastException: class
>>> jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class
>>> java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader
>>> and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
>>>
>>>
>>> So I am asking Hive 3 doesn't support java 11 yet?
>>>
>>>
>>> Thanks.
>>>
>>
>>
>> --
>> --
>> Pau Tallada Crespí
>> Departament de Serveis
>> Port d'Informació Científica (PIC)
>> Tel: +34 93 170 2729
>> --
>>
>>


Re: Failing tests

2022-03-15 Thread Stamatis Zampetakis
Hello,

+1 to everything Peter said.

Moreover a few other things/reminders which could make our life easier.

No commits/merges over broken master.

If there is a non-flaky failure in master then whoever notices it first,
please create a JIRA and add any relevant info. This will notify everyone
that there is a problem and it will also avoid having multiple people
looking at it.

If there is failure in master or during precommit tests that seems to be
intermittent please run the flaky checker job [1]. If the result shows it's
flaky, log a JIRA and raise a PR disabling the test if there is no quick
fix available.

Rerun precommit tests before merging a pull request if the latest precommit
run is old (e.g., greater than 72h).

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-flaky-check/


On Mon, Mar 14, 2022 at 9:31 PM Peter Vary 
wrote:

> If I remember correctly the decision was to not to merge changes with
> failing PreCommit tests.
>
> Lately, because of a mistake where the change was only partially merged,
> we had a failing test.
> I have tried to fix this issue and confirm it by rerunning the tests, but
> the check failed again. Now it failed with some different tests, because in
> the meantime there were some more failing tests were committed to master in
> the meantime.
>
> I think it would be good to stick to the previous decision and we should
> only commit changes if all of the tests are green. Also if there are some
> issues then it would be good to take the time to fix the failures or revert
> the changes causing the issues.
>
> Thanks,
> Peter
>
>
>


Re: Is the web broken?

2022-03-15 Thread Stamatis Zampetakis
Hi,

>From the discussion in INFRA-20776, it seems there has been some kind of
communication in the private hive list. If it is possible to share this
information please let us know.

Apart from that please stop including the dev@calcite list in this thread.
I accidentally started that by cc'ing the wrong list in my previous email;
I meant to include dev@hive.

Best,
Stamatis

On Tue, Mar 15, 2022 at 8:54 PM Chao Sun  wrote:

> Hi Gavin,
>
> Do you have any idea why hive.apache.org is broken after the hive-site
> change?
>
> Thanks,
> Chao
>
> On Thu, Mar 10, 2022 at 7:16 PM Chao Sun  wrote:
> >
> > Sorry, we are in the process of migrating to hive-site.git from CMS.
> There are a few issues to be fixed and I’m talking with ASF infra on this.
> >
> > On Thu, Mar 10, 2022 at 6:05 PM 王道远(健身) 
> wrote:
> >>
> >> Hi,
> >>
> >> Could we keep a redirect page at the old place, or a CNAME resolution?
> >>
> >> Best,
> >> Adrian
> >>
> >> --
> >> 发件人:Stamatis Zampetakis 
> >> 发送时间:2022年3月10日(星期四) 18:33
> >> 收件人:user ; dev 
> >> 主 题:Re: Is the web broken?
> >>
> >>
> >> Hi,
> >>
> >> I am not sure if I am missing something but I get the impression [1]
> that the site from now on will be served from here:
> >> https://apache.github.io/hive-site/
> >>
> >> Best,
> >> Stamatis
> >>
> >> [1] https://issues.apache.org/jira/browse/INFRA-20776
> >>
> >> On Thu, Mar 10, 2022 at 10:21 AM Ming  wrote:
> >> I have the same situation
> >>
> >> Ming
> >> shezhimin...@gmail.com
> >> 签名由 网易邮箱大师 定制
> >>
> >> On 03/10/2022 17:12,Pau Tallada wrote:
> >> Is it only me or the https://hive.apache.org/ web is showing a
> directory listing?!
> >>
> >> --
> >> --
> >> Pau Tallada Crespí
> >> Departament de Serveis
> >> Port d'Informació Científica (PIC)
> >> Tel: +34 93 170 2729
> >> --
> >>
>


Re: Is the web broken?

2022-03-22 Thread Stamatis Zampetakis
Hi all,

There have been various problems around the website lately and from the
exchanges in [1,2] it is still not clear to me what's the process to update
the website.

We have more or less the same code and Jekyll builders in two repositories
[3,4] which doesn't seem normal.

Moreover, the hive-site repo currently has three branches:
* main
* asf-site
* gh-pages
and it is not completely clear to me the role of each.

>From my perspective all the source code (.md files and Jekyll generators)
should be in the main Hive repo [3] and the hive-site [4] should only be
used for the deployed site (html pages). In the hive-site repo, we
shouldn't need to maintain and use more than one branch.

We should convert to some workflow that we want to follow otherwise
updating the website will be a big pain in the near future.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/INFRA-20776
[2] https://issues.apache.org/jira/browse/INFRA-23007
[3] https://github.com/apache/hive/tree/master/docs
[4] https://github.com/apache/hive-site

On Tue, Mar 15, 2022 at 9:11 PM Stamatis Zampetakis 
wrote:

> Hi,
>
> From the discussion in INFRA-20776, it seems there has been some kind of
> communication in the private hive list. If it is possible to share this
> information please let us know.
>
> Apart from that please stop including the dev@calcite list in this thread.
> I accidentally started that by cc'ing the wrong list in my previous email;
> I meant to include dev@hive.
>
> Best,
> Stamatis
>
> On Tue, Mar 15, 2022 at 8:54 PM Chao Sun  wrote:
>
>> Hi Gavin,
>>
>> Do you have any idea why hive.apache.org is broken after the hive-site
>> change?
>>
>> Thanks,
>> Chao
>>
>> On Thu, Mar 10, 2022 at 7:16 PM Chao Sun  wrote:
>> >
>> > Sorry, we are in the process of migrating to hive-site.git from CMS.
>> There are a few issues to be fixed and I’m talking with ASF infra on this.
>> >
>> > On Thu, Mar 10, 2022 at 6:05 PM 王道远(健身) 
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Could we keep a redirect page at the old place, or a CNAME resolution?
>> >>
>> >> Best,
>> >> Adrian
>> >>
>> >> --
>> >> 发件人:Stamatis Zampetakis 
>> >> 发送时间:2022年3月10日(星期四) 18:33
>> >> 收件人:user ; dev 
>> >> 主 题:Re: Is the web broken?
>> >>
>> >>
>> >> Hi,
>> >>
>> >> I am not sure if I am missing something but I get the impression [1]
>> that the site from now on will be served from here:
>> >> https://apache.github.io/hive-site/
>> >>
>> >> Best,
>> >> Stamatis
>> >>
>> >> [1] https://issues.apache.org/jira/browse/INFRA-20776
>> >>
>> >> On Thu, Mar 10, 2022 at 10:21 AM Ming  wrote:
>> >> I have the same situation
>> >>
>> >> Ming
>> >> shezhimin...@gmail.com
>> >> 签名由 网易邮箱大师 定制
>> >>
>> >> On 03/10/2022 17:12,Pau Tallada wrote:
>> >> Is it only me or the https://hive.apache.org/ web is showing a
>> directory listing?!
>> >>
>> >> --
>> >> --
>> >> Pau Tallada Crespí
>> >> Departament de Serveis
>> >> Port d'Informació Científica (PIC)
>> >> Tel: +34 93 170 2729
>> >> --
>> >>
>>
>


Re: [VOTE] Apache Hive 4.0.0-alpha-1 Release Candidate 1

2022-03-22 Thread Stamatis Zampetakis
Hi Peter,

Many thanks for rolling out the RC and for resolving many of the blocker
issues that were remaining.

In general, it is a good practice to include the commit hash (which tags
the release) and the checksum hashes of the release artifacts [1] in the
vote email to minimize the chances of man-in-the-middle attacks and voting
on wrong packages.
Can you please update this thread with those?

Best,
Stamatis

[1] https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc1/


On Tue, Mar 22, 2022 at 5:00 PM Naveen Gangam 
wrote:

> I have been able to build and run a quick test. I have NOT verified the
> signature. I was trying to run the HMS Checkin tests and got this. I
> suspect these are not specific to the alpha-1 branch. But it is not a test
> failure (although it appears like it should be)
> *"mvn test
>
> -Dtest.groups=org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest"*
>
> [*INFO*] Running org.apache.hadoop.hive.common.metrics.*TestLegacyMetrics*
>
> [main] WARN org.apache.hadoop.hive.common.metrics.LegacyMetrics - Could not
> find counter value for foo.n, returning null instead.
>
> javax.management.AttributeNotFoundException: Key [foo.n] not found/tracked
>
> at
>
> org.apache.hadoop.hive.common.metrics.MetricsMBeanImpl.getAttribute(MetricsMBeanImpl.java:56)
>
>
> [*WARNING*] *Tests **run: 18*, Failures: 0, Errors: 0, *Skipped: 2*, Time
> elapsed: 4.158 s - in
> org.apache.hadoop.hive.metastore.client.*TestCatalogs*
>
> [*INFO*] Running org.apache.hadoop.hive.metastore.*TestMarkPartition*
>
> [*INFO*] *Tests run: 1*, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> 15.021 s - in org.apache.hadoop.hive.metastore.*TestMarkPartition*
>
> [*INFO*]
>
> [*INFO*] Results:
>
> [*INFO*]
>
> [*WARNING*] *Tests run: 2182, Failures: 0, Errors: 0, Skipped: 5*
>
>
> So over no test failures.
>
> +1 pending other votes (& non-binding)
>
> Thank you
> Naveen
>
> On Tue, Mar 22, 2022 at 9:32 AM Marton Bod 
> wrote:
>
> > +1 (non-binding)
> > Tested the checksums, signatures and built it successfully
> >
> > On Tue, Mar 22, 2022 at 2:26 PM Peter Vary 
> > wrote:
> >
> > > Hi Team,
> > >
> > > Apache Hive 4.0.0-alpha-1 Release Candidate 1 is available here:
> > >
> > > https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc1/
> > >
> > > Maven artifacts are available here:
> > >
> > > https://repository.apache.org/content/repositories/orgapachehive-/
> > >
> > > The tag 4.0.0-alpha-1-rc1 has been applied to the source for this
> release
> > > in github, you can see it at
> > > https://github.com/apache/hive/tree/release-4.0.0-alpha-1-rc1
> > >
> > > Voting will conclude in 72 hours.
> > >
> > > All interested parties: Please test.
> > > Hive PMC Members: Please test and vote.
> > >
> > > Thanks.
> >
>


Re: [VOTE] Apache Hive 4.0.0-alpha-1 Release Candidate 1

2022-03-23 Thread Stamatis Zampetakis
Ubuntu 20.04.4 LTS, jdk1.8.0_261, Apache Maven 3.6.3

 * Checked signatures and checksums OK
 * Checked diff between repo and release sources (diff -qr hive
apache-hive-4.0.0-alpha-1-src) KO
 * Built from git tag (mvn clean install -DskipTests -Pitests) OK
 * Built from release sources (mvn clean install -DskipTests -Pitests) OK

While comparing the content of the git repo with the release sources I
noticed various differences. Most notable ones for which I cast a negative
vote are listed below:

Only in apache-hive-4.0.0-alpha-1-src/common/src: gen
Only in apache-hive-4.0.0-alpha-1-src/conf: hive-default.xml.template
Only in apache-hive-4.0.0-alpha-1-src/itests/hive-unit: cmroot
Only in apache-hive-4.0.0-alpha-1-src/ql: dependency-reduced-pom.xml
Only in
apache-hive-4.0.0-alpha-1-src/standalone-metastore/metastore-common/src/gen:
version
Only in
apache-hive-4.0.0-alpha-1-src/standalone-metastore/metastore-server:
derby.log
Only in
apache-hive-4.0.0-alpha-1-src/standalone-metastore/metastore-server:
metastore_db
Only in
apache-hive-4.0.0-alpha-1-src/standalone-metastore/metastore-server/src: gen
Only in apache-hive-4.0.0-alpha-1-src/streaming: ${test.tmp.dir}
Only in hive/: README.md
Only in hive/: core

The fact that derby.log and metastore_db appears in the released sources
it's definitely not normal.

Other than that I was surprised to see that itests sources are part of the
released sources. I thought that the goal of keeping them separate was to
avoid releasing them along with the main code. I checked previous releases
and the directory is there so I suppose it is intentional to have them in
apache-hive-4.0.0-alpha-1-src.tar.gz

For future votes, I think it is useful to include in the email a pointer to
the PGP key that was used to sign the release. I knew where to find it but
not sure if everyone does. I have to note that the key that was used to
sign the release does not seem to be signed by any other member of the PMC;
this is a bit problematic but not a blocker [1].

Last, I've seen that the released sources do not contain a README file with
instructions or pointers on how to build the project.

-1 (non-binding)

Best,
Stamatis

[1] https://www.apache.org/info/verification.html


On Wed, Mar 23, 2022 at 11:45 AM Peter Vary 
wrote:

> Hi Stamatis,
>
> Here is the data you have suggested:
> Commit hash: 357d4906f5c806d585fd84db57cf296e12e6049b
> Checksums:
> ff60286044d2f3faa8ad1475132cdcecf4ce9ed8faf1ed4e56a6753ebc3ab585
> apache-hive-4.0.0-alpha-1-bin.tar.gz
> 07f30371df5f624352fa1d0fa50fd981a4dec6d4311bb340bace5dd7247d3015
> apache-hive-4.0.0-alpha-1-src.tar.gz
>
> Also added it to the
> https://cwiki.apache.org/confluence/display/Hive/HowToRelease <
> https://cwiki.apache.org/confluence/display/Hive/HowToRelease> wiki page
> as well
>
> Thanks,
> Peter
>
> > On 2022. Mar 22., at 18:22, Stamatis Zampetakis 
> wrote:
> >
> > Hi Peter,
> >
> > Many thanks for rolling out the RC and for resolving many of the blocker
> > issues that were remaining.
> >
> > In general, it is a good practice to include the commit hash (which tags
> > the release) and the checksum hashes of the release artifacts [1] in the
> > vote email to minimize the chances of man-in-the-middle attacks and
> voting
> > on wrong packages.
> > Can you please update this thread with those?
> >
> > Best,
> > Stamatis
> >
> > [1] https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc1/
> >
> >
> > On Tue, Mar 22, 2022 at 5:00 PM Naveen Gangam
> 
> > wrote:
> >
> >> I have been able to build and run a quick test. I have NOT verified the
> >> signature. I was trying to run the HMS Checkin tests and got this. I
> >> suspect these are not specific to the alpha-1 branch. But it is not a
> test
> >> failure (although it appears like it should be)
> >> *"mvn test
> >>
> >>
> -Dtest.groups=org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest"*
> >>
> >> [*INFO*] Running
> org.apache.hadoop.hive.common.metrics.*TestLegacyMetrics*
> >>
> >> [main] WARN org.apache.hadoop.hive.common.metrics.LegacyMetrics - Could
> not
> >> find counter value for foo.n, returning null instead.
> >>
> >> javax.management.AttributeNotFoundException: Key [foo.n] not
> found/tracked
> >>
> >> at
> >>
> >>
> org.apache.hadoop.hive.common.metrics.MetricsMBeanImpl.getAttribute(MetricsMBeanImpl.java:56)
> >>
> >>
> >> [*WARNING*] *Tests **run: 18*, Failures: 0, Errors: 0, *Skipped: 2*,
> Time
> >> elapsed: 4.158 s - in
> >> org.apache.hadoop.hive.metastore.client.*TestCatalogs*
> >>
> &

Re: [VOTE] Apache Hive 3.1.3 Release Candidate 1

2022-03-24 Thread Stamatis Zampetakis
Thanks for pushing this forward Naveen.

I checked the released sources in apache-hive-3.1.3-src and they contain
modified LGPL files violating the ASF release policy.
The problem is the same reported under HIVE-25665. I think the fix
should be backported to branch-3 before moving forward with the release.

-1 (non-binding)

Best,
Stamatis

On Wed, Mar 23, 2022 at 9:47 PM Naveen Gangam 
wrote:

> Apache Hive 3.1.3 Release Candidate 1 is available here:
> https://people.apache.org/~ngangam/apache-hive-3.1.3-rc-1
>
> The checksums are these:
> - *e0551a6fe328be5ff0fa16d275b65f43f56c35da66ac4e391e47d3e74d466b91*
> apache-hive-3.1.3-bin.tar.gz
>
> - *ce35a179304055004023bec016518fcb40b2ce2b14238ab77aebec99815fde02*
> apache-hive-3.1.3-src.tar.gz
>
>
> Maven artifacts are available
> here:https://repository.apache.org/content/repositories/orgapachehive-1112
>
> The tag release-3.1.3-rc1 has been applied to the source for this
> release in github, you can see it
> athttps://github.com/apache/hive/tree/release-3.1.3-rc1
>
> The git commit hash is: cc050e40eb55f6c9f1aa08c00c1689f657747afb
> <
> https://github.com/apache/hive/commit/cc050e40eb55f6c9f1aa08c00c1689f657747afb
> >
> Voting will conclude in 72 hours.
>
> Hive PMC Members: Please test and vote.
>
> Thanks.
>


Re: [VOTE] Apache Hive 4.0.0-alpha-1 Release Candidate 2

2022-03-26 Thread Stamatis Zampetakis
Ubuntu 20.04.4 LTS, jdk1.8.0_261, Apache Maven 3.6.3

 * Checked signatures and checksums OK
 * Checked diff between repo and release sources (diff -qr hive
apache-hive-4.0.0-alpha-1-src) OK
 * Built from git tag (mvn clean install -DskipTests -Pitests) OK
 * Built from release sources (mvn clean install -DskipTests -Pitests) OK
 * Run smoke tests on pseudo cluster using hive-dev-box OK

All of the issues that were found in the previous RC are either resolved or
tracked under respective JIRAs to be solved for the next release.

Smoke tests included:
* Derby metastore initialization
* simple CREATE TABLE statements;
* basic INSERT INTO VALUES statements;
* basic SELECT * FROM WHERE variations;
* EXPLAIN statement variations;
* ANALYZE TABLE variations;

+1 (non-binding)

Best,
Stamatis

On Thu, Mar 24, 2022 at 12:01 PM Peter Vary 
wrote:

> Hi Team,
>
> Apache Hive 4.0.0-alpha-1 Release Candidate 2 is available here:
> https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc2/ <
> https://people.apache.org/~pvary/apache-hive-4.0.0-alpha-1-rc2/>
>
> The checksums are these:
> - 1e450197dbf847696b05042eb68b78b968064f1f1b369a7fb0b77a6329a27809
> apache-hive-4.0.0-alpha-1-bin.tar.gz
> - a21a609ec2e30f8cc656242c545bb3a04de21c2a1eee90808648e3aa4bf3d04e
> apache-hive-4.0.0-alpha-1-src.tar.gz
>
> Maven artifacts are available here:
> https://repository.apache.org/content/repositories/orgapachehive-1113/ <
> https://repository.apache.org/content/repositories/orgapachehive-1113/>
>
> The tag 4.0.0-alpha-1-rc1 has been applied to the source for this release
> in github, you can see it at
> https://github.com/apache/hive/tree/release-4.0.0-alpha-1-rc1 <
> https://github.com/apache/hive/tree/release-4.0.0-alpha-1-rc1>
>
> The git commit hash is:
>
> https://github.com/apache/hive/commit/357d4906f5c806d585fd84db57cf296e12e6049b
> <
> https://github.com/apache/hive/commit/357d4906f5c806d585fd84db57cf296e12e6049b
> >
>
> Voting will conclude in 72 hours.
>
> All interested parties: Please test.
> Hive PMC Members: Please test and vote.
>
> Thanks.


Re: separated authN configuration for binary and http transports

2022-03-28 Thread Stamatis Zampetakis
Hey Janos,

You brought up an interesting subject.

I haven't worked on the code around the authentication process so cannot
foresee the impact on the codebase but high level your idea seems
reasonable to me.

I would be favorable in such a change but I would definitely like to see
some tests and documentation come along from the one who pushes this
forward.

Best,
Stamatis

On Fri, Mar 18, 2022, 6:40 PM Janos Kovacs  wrote:

> Hi,
>
> I just found that while HS2 can do authentication with mixed methods - like
> Kerberos+LDAP - it only works with the binary protocol. With the transport
> set to http, the authentication basically works only against what is set by
> hive.server2.authentication. If e.g. it's set to LDAP, it doesn't try other
> methods, even if the client is sending the Negotiate headers in the
> request.
>
> While this is something that probably could be fixed, I was thinking about
> a quick(er) fix that might sounds just a workaround first, but adding the
> fact that HS2 now can do both binary and http transports together
> (HIVE-5312) and that there are other authentication methods which support
> only one type of transports - like SAML works only with http transport -,
> this might be a good enhancement by itself: split the
> hive.server2.authentication between binary and http with introducing
> hive.server2.http.authentication.
>
> If the http transport could be configured independently from the binary
> transport, then HS2 could run in dual-transport mode, e.g. binary offering
> Kerberos+LDAP while http offering SAML (or any other independent method).
>
> Could you please share your thoughts on splitting the authN method between
> the two transport modes?
>
> Thanks, Janos
>


Re: Start releasing the master branch

2022-03-30 Thread Stamatis Zampetakis
Thanks for pushing this forward Peter! Being the RM for this huge release
was not an easy task.

Let's now aim for smaller and much more frequent releases.

Personally, I would prefer to keep the alpha-X suffix for a while so I
would opt for 4.0.0-alpha-2-SNAPSHOT for the next iteration.

Best,
Stamatis


On Wed, Mar 30, 2022, 8:02 PM Peter Vary  wrote:

> Thanks to everyone who helped to create the Hive 4.0.0-alpha-1 release!
> I really hope this helps our users to try out our previously unreleased new
> features.
>
> As a last step of the release process, I will update the versions for the
> next release.
> I would like to ask your opinion about the next version.
>
> Which version should we use for the development:
> - 4.0.0-SNAPSHOT
> - 4.0.0-alpha-2-SNAPSHOT
>
> Thanks,
> Peter
>
> On Mon, 21 Mar 2022 at 15:59, Peter Vary  wrote:
>
> > Hi Team,
> >
> > If everyone agrees, tomorrow I would like to start the  release process
> > for 4.0.0-alpha-1.
> >
> > Is there any outstanding blocker jira that you know of?
> >
> > Thanks,
> > Peter
> >
> >
> > > On 2022. Mar 9., at 17:01, Stamatis Zampetakis 
> > wrote:
> > >
> > > I just logged HIVE-26022 [1] which seems to be another potential
> blocker
> > > for 4.0.0-alpha-1.
> > >
> > > Best,
> > > Stamatis
> > >
> > > [1] https://issues.apache.org/jira/browse/HIVE-26022
> > >
> > > On Thu, Mar 3, 2022 at 3:54 PM Peter Vary  wrote:
> > >
> > >> Hi Team,
> > >>
> > >> Here is our status:
> > >> We collected the blocker tickets and marked them with fixVersion
> > >> 4.0.0-alpha-1:
> > >>
> > >>
> >
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20%3D%20HIVE%20AND%20resolution%20%3D%20Unresolved%20AND%20fixVersion%20%3D%204.0.0-alpha-1
> > >> <
> >
> https://issues.apache.org/jira/issues/?filter=-1&jql=project%20=%20HIVE%20AND%20resolution%20=%20Unresolved%20AND%20fixVersion%20=%204.0.0-alpha-1
> > >
> > >>
> > >>   - HIVE-26002 - Create db scripts for 4.0.0-alpha-1
> > >>   - HIVE-25994 - Analyze table runs into ClassNotFoundException-s in
> > >>   case binary distribution is used
> > >>   - HIVE-25935 - Cleanup IMetaStoreClient#getPartitionsByNames APIs
> > >>
> > >> Please create a jira and mark it with fixVersion 4.0.0-alpha-1, if you
> > >> happen to know of any other blockers.
> > >>
> > >> We plan to fix these jiras, and then release the following artifacts
> > >> together:
> > >>
> > >>   - Storage API - 4.0.0-alpha-1
> > >>   - Standalone Metastore - 4.0.0-alpha-1
> > >>   - Hive - 4.0.0-alpha-1
> > >>
> > >>
> > >> Thanks,
> > >> Peter
> > >>
> > >>
> > >> On 2022. Mar 2., at 11:50, Peter Vary  wrote:
> > >>
> > >> Will continue this discussion on the #hive ASF slack. If you are
> > >> interested, please join.
> > >> We will do updates here time-to-time, so the ones who are not using
> > slack
> > >> can participate that way.
> > >>
> > >> On 2022. Mar 2., at 11:11, Peter Vary  wrote:
> > >>
> > >> Good idea Zoltan, joined the channel.
> > >> I would like to scope reasonably small, so I agree with focusing on
> > >> 4.0.0-alpha-1
> > >>
> > >> On 2022. Mar 2., at 11:01, Zoltan Haindrich  wrote:
> > >>
> > >> Hey,
> > >>
> > >> regarding 4.0.0 / 4.0.0-alpha-1 target/fix versions in the jira:
> > >> * I think we should change all already resolved tickets with fix
> version
> > >> 4.0.0 to have fix version 4.0.0-alpha-1
> > >> ** this could be postponed until we are actually releasing the thing
> as
> > I
> > >> think everyone committing to the master is entering 4.0.0 as fix
> version
> > >> without much aftertought...this could probably change after we get the
> > >> first release out.
> > >> * regarding the the existing tickets with fix version/target version
> > 4.0.0
> > >> - I think that would be a bit too much (>200 tickets)
> > >> ** some numbers:
> > >> *** 239 tickets open now
> > >> *** 224 was not updated in the last 90 days
> > >> *** 216 was not changed in the last 180 days
> > >> *** 178 was not updated in the las

Re: [VOTE] Apache Hive 3.1.3 Release Candidate 2

2022-03-31 Thread Stamatis Zampetakis
Ubuntu 20.04.4 LTS, jdk1.8.0_261, Apache Maven 3.6.3

 * Checked signatures and checksums OK
 * Checked for checkstyle modified LGPL files OK
 * Checked for illegal licenses in release binaries (jars) using [1] OK
 * Built from git tag (mvn clean install -DskipTests -Pitests) OK
 * Built from release sources (mvn clean install -DskipTests -Pitests) OK
 * Checked diff between repo and release sources (diff -qr hive
apache-hive-3.1.3-src) KO

While comparing the content of the git repo with the release sources I
noticed various differences. Most notable ones for which I cast a negative
vote are listed below:

Only in apache-hive-3.1.3-src/common/src: gen
Only in apache-hive-3.1.3-src/conf: hive-default.xml.template
Only in apache-hive-3.1.3-src/hcatalog/core: mapred
Only in apache-hive-3.1.3-src/itests: ${project.basedir}
Only in apache-hive-3.1.3-src/itests/hive-unit: metastore_db2
Only in apache-hive-3.1.3-src/itests/qtest: ${project.basedir}
Only in apache-hive-3.1.3-src/itests: qtest-kudu
Only in apache-hive-3.1.3-src/ql: dependency-reduced-pom.xml
Only in hive: README.md
Only in apache-hive-3.1.3-src/standalone-metastore: metastore-common
Only in apache-hive-3.1.3-src/standalone-metastore: metastore-server
Only in apache-hive-3.1.3-src/standalone-metastore: metastore-tools
Only in apache-hive-3.1.3-src/standalone-metastore/src/gen: version
Only in apache-hive-3.1.3-src/upgrade-acid: pre-upgrade

Some of the problems above (e.g., existing metastore_db2) are most likely
coming from the fact that your local git repo was not completely clean.

-1 (non-binding)

Best,
Stamatis

[1] for f in `find . -name "*.jar"`; do echo $f; jar xf $f
META-INF/LICENSE; head -5 META-INF/*; done >> ALL_LICENSES


On Thu, Mar 31, 2022 at 8:59 AM Wang, Yuming 
wrote:

> +1 (non-binding) Tested through:
> https://github.com/apache/spark/pull/36018
>
>
> From: Naveen Gangam 
> Date: Wednesday, March 30, 2022 at 21:14
> To: dev@hive.apache.org 
> Subject: Re: [VOTE] Apache Hive 3.1.3 Release Candidate 2
> External Email
>
> Still seeking votes. Voting ends tomorrow. Any help would be appreciated.
>
> Thank you
> Naveen
>
> On Tue, Mar 29, 2022 at 5:51 AM Peter Vary 
> wrote:
>
> > Downloaded the 3.1.3 artifacts, and checked the signatures. They are OK.
> > Used the binary to run some basic tests, and it seems OK.
> >
> > +1 (binding)
> >
> > > On 2022. Mar 28., at 23:19, Naveen Gangam  >
> > wrote:
> > >
> > > Apache Hive 3.1.3 Release Candidate 2 is available here:
> > >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople.apache.org%2F~ngangam%2Fapache-hive-3.1.3-rc-2&data=04%7C01%7Cyumwang%40ebay.com%7Ca9fbec3df79d488725ba08da124f07c7%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637842428468446173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=H9WqpQWLF0KzS1uPi1WXlDYv%2FCOR2KagWLYcotoqBT0%3D&reserved=0
> > >
> > > The checksums are these:
> > >
> > >
> > > - 55c58e0111bd32de3d02f5f25d9eb054ba65ab02aaf669637760eaf56ef1fbb1
> > > apache-hive-3.1.3-bin.tar.gz
> > >
> > >
> > > - 22862e6bf76a4783a3d8d298634728cc9d6561563af2413a687fe63e35bcc527
> > > apache-hive-3.1.3-src.tar.gz
> > >
> > >
> > > Maven artifacts are available here:
> > >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachehive-1114&data=04%7C01%7Cyumwang%40ebay.com%7Ca9fbec3df79d488725ba08da124f07c7%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637842428468446173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=puA%2FrgfwhIA4wCXE162DfjoHtpc4NUGPFkJfY6G1amM%3D&reserved=0
> > >
> > > The tag release-3.1.3-rc2 has been applied to the source for this
> > > release in github, you can see it
> > > athttps://
> nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Ftree%2Frelease-3.1.3-rc2&data=04%7C01%7Cyumwang%40ebay.com%7Ca9fbec3df79d488725ba08da124f07c7%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637842428468446173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=N0Sv3tqLJlaSI%2BCUmB0wzmyObxlfIMvtzHWTySoROVU%3D&reserved=0
> > >
> > > The git commit hash is: 4df4d75bf1e16fe0af75aad0b4179c34c07fc975
> > > <
> >
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Fcommit%2F4df4d75bf1e16fe0af75aad0b4179c34c07fc975&data=04%7C01%7Cyumwang%40ebay.com%7Ca9fbec3df79d488725ba08da124f07c7%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637842428468446173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=lZY3bCBmacmI2HbTsFRhgggU242Mba3%2BxVB34z6fJ9s%3D&reserved=0
> > >
> > > Voting will conclude in 72 hours.
> > >
> > > Hive PMC Members: Please test and vote.
> > >
> > > Thanks.
> > >
> > > Naveen
> >
> >
>


Re: [VOTE] Apache Hive 3.1.3 Release Candidate 3

2022-04-08 Thread Stamatis Zampetakis
Ubuntu 20.04.4 LTS, jdk1.8.0_261, Apache Maven 3.6.3

 * Checked signatures and checksums OK
 * Checked for checkstyle modified LGPL files OK
 * Checked for illegal licenses in release binaries (jars) using [1] OK
 * Checked diff between repo and release sources (diff -qr hive
apache-hive-3.1.3-src) OK
 * Built from git tag (mvn clean install -DskipTests -Pitests -Pjavadoc) OK
 * Built from release sources (mvn clean install -DskipTests -Pitests
-Pjavadoc) OK
 * Run smoke tests in hive-dev-box using hadoop 3.1.0 and tez 0.9.1 OK

- Initialized derby metastore
- Simple CREATE, INSERT, ANALYZE queries
- Simple SPJA queries
- EXPLAIN variations

+1 (non-binding)

Best,
Stamatis

[1] for f in `find . -name "*.jar"`; do echo $f; jar xf $f
META-INF/LICENSE; head -5 META-INF/*; done >> ALL_LICENSES

On Thu, Apr 7, 2022 at 6:43 PM Chao Sun  wrote:

> +1 (binding)
>
> - verified the signatures and checksums
> - tried the binary and tested a few queries.
> - built from source
>
> Thanks Naveen!
>
> Best,
> Chao
>
>
> On Thu, Apr 7, 2022 at 1:28 AM Peter Vary 
> wrote:
> >
> > Downloaded the 3.1.3 artifacts, and checked the signatures. They are OK.
> > Used the binary to run some basic tests, and it seems OK.
> >
> > +1 (binding)
> >
> > > On 2022. Apr 6., at 20:32, Szehon Ho  wrote:
> > >
> > > +1 (binding)
> > >
> > > Downloaded and ran create , insert, simple query on postgres.
> > > Verified checksums.
> > > Built from source.
> > >
> > > Thanks,
> > > Szehon
> > >
> > > On Mon, Apr 4, 2022 at 7:56 AM Naveen Gangam
> 
> > > wrote:
> > >
> > >> *[No new commits from RC2]. Just cleaned up
> **apache-hive-3.1.3-src.tar.gz*
> > >> *archive*
> > >>
> > >>
> > >> Apache Hive 3.1.3 Release Candidate 3 is available here:
> > >> https://people.apache.org/~ngangam/apache-hive-3.1.3-rc-3
> > >>
> > >> The checksums are these:
> > >>
> > >>
> > >> - 0c9b6a6359a7341b6029cc9347435ee7b379f93846f779d710b13f795b54bb16
> > >> apache-hive-3.1.3-bin.tar.gz
> > >>
> > >>
> > >> - b5e17f664afbb5ac702f0de0a31363caf58e067b19229df63da01c38430f6fcc
> > >> apache-hive-3.1.3-src.tar.gz
> > >>
> > >>
> > >> Maven artifacts are available here:
> > >> https://repository.apache.org/content/repositories/orgapachehive-1116
> > >>
> > >>
> > >> The tag release-3.1.3-rc3 has been applied to the source for this
> > >> release in github, you can see it at
> > >>
> > >> https://github.com/apache/hive/tree/release-3.1.3-rc2
> > >>
> > >> The git commit hash is: 4df4d75bf1e16fe0af75aad0b4179c34c07fc975
> > >> <
> > >>
> https://github.com/apache/hive/commit/4df4d75bf1e16fe0af75aad0b4179c34c07fc975
> > >>>
> > >> Voting will conclude in 72 hours.
> > >>
> > >> Hive PMC Members: Please test and vote.
> > >>
> > >> Thanks.
> > >>
> >
>


[DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-06 Thread Stamatis Zampetakis
Hi all,

The current master has many critical bug fixes as well as important
performance improvements that are not backported (and most likely never
will) to the maintenance branches.

Backporting changes from master usually requires adapting the code and
tests in questions making it a non-trivial and time consuming task.

The ASF bylaws require PMCs to deliver high quality software which satisfy
certain criteria. Cutting new releases from maintenance branches with known
critical bugs is not compliant with the ASF.

CI is unstable in all maintenance branches making the quality of a release
questionable and merging new PRs rather difficult. Enabling and running it
frequently in all maintenance branches would require a big amount of
resources on top of what we already need for master.

History has shown that it is very difficult or impossible to properly
maintain multiple release branches for Hive.

I think it would be to the best interest of the project if the PMC decided
to drop support for maintenance branches and focused on releasing
exclusively from master.

This mail is related to the discussion about the release cadence [1] since
it would certainly help making Hive releases more regular. I decided to
start a separate thread to avoid mixing multiple topics together.

Looking forward to your thoughts.

Best,
Stamatis

[1] https://lists.apache.org/thread/n245dd23kb2v3qrrfp280w3pto89khxj


Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-10 Thread Stamatis Zampetakis
e:
>>
>>
>> Hi Team,
>>
>> My experience with the Iceberg community shows that there are some
>> sizeable userbase around Hive 2.x. I have seen patches, contributions to
>> Hive 2.3.x branches, and the tests are in much better shape there.
>>
>> I would definitely vote for EOL Hive 1.x, but until we have a stable 4.x,
>> I would be cautious about slashing 2.x, 3.x branches.
>>
>> Just my 2 cents.
>>
>> Peter
>>
>> On 2022. May 9., at 10:51, Alessandro Solimando <
>> alessandro.solima...@gmail.com> wrote:
>>
>> Hi Stamatis,
>> thanks for bringing up this topic, I basically agree on everything you
>> wrote.
>>
>> I just wanted to add that this kind of proposal might sound harsh,
>> because in many contexts upgrading is a complex process, but it's in
>> nobody's interest to keep release branches that are missing important
>> fixes/improvements and that might not meet the quality standards that
>> people expect, as mentioned.
>>
>> Since we don't have yet a stable 4.x release (only alpha for now) we
>> might want to keep supporting the 3.x branch until the first 4.x stable
>> release and EOL < 3.x branches, WDYT?
>>
>> Best regards,
>> Alessandro
>>
>> On Fri, 6 May 2022 at 23:14, Stamatis Zampetakis 
>> wrote:
>>
>>
>> Hi all,
>>
>> The current master has many critical bug fixes as well as important
>> performance improvements that are not backported (and most likely never
>> will) to the maintenance branches.
>>
>> Backporting changes from master usually requires adapting the code and
>> tests in questions making it a non-trivial and time consuming task.
>>
>> The ASF bylaws require PMCs to deliver high quality software which
>> satisfy certain criteria. Cutting new releases from maintenance branches
>> with known critical bugs is not compliant with the ASF.
>>
>> CI is unstable in all maintenance branches making the quality of a
>> release questionable and merging new PRs rather difficult. Enabling and
>> running it frequently in all maintenance branches would require a big
>> amount of resources on top of what we already need for master.
>>
>> History has shown that it is very difficult or impossible to properly
>> maintain multiple release branches for Hive.
>>
>> I think it would be to the best interest of the project if the PMC
>> decided to drop support for maintenance branches and focused on releasing
>> exclusively from master.
>>
>> This mail is related to the discussion about the release cadence [1]
>> since it would certainly help making Hive releases more regular. I decided
>> to start a separate thread to avoid mixing multiple topics together.
>>
>> Looking forward to your thoughts.
>>
>> Best,
>> Stamatis
>>
>> [1]
>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread%2Fn245dd23kb2v3qrrfp280w3pto89khxj&data=05%7C01%7Cbbattula%40visa.com%7Ccba1383657724a00f0bb08da31e069bc%7C38305e12e15d4ee888b9c4db1c477d76%7C0%7C0%7C637877137169408371%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X3BJyzgALXZVnjmd2PzbLrOi4lXMHxEQa8KwA1Pz7BQ%3D&reserved=0
>>
>>
>>


Re: excluding jdk.tools for java 11 compatibility

2022-05-12 Thread Stamatis Zampetakis
Apparently there is now a JIRA, HIVE-26226 [1], about removing jdk.tools so
let's continue the discussion there.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26226


On Thu, May 5, 2022 at 6:27 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi again,
> actually I managed to exclude the project by using the FQN (I was missing
> the "upgrade-acid/" part):
>
> mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.0.2155:sonar \
>  -DskipTests -Dit.skipTests -Dmaven.javadoc.skip -pl
> '!upgrade-acid,!upgrade-acid/pre-upgrade'
>
> I would still like to hear your opinion about the exclusion, since it will
> be a problem when moving to JDK11 anyway, which I have seen it's a blocker
> for 4.0.0 release.
>
> Best regards,
> Alessandro
>
> On Thu, 5 May 2022 at 16:38, Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hi everyone,
> > I am working on https://issues.apache.org/jira/browse/HIVE-26196.
> >
> > As you might know, Sonar analysis must now run with at least JDK 11, and
> > when I tried it failed as follows:
> >
> > [ERROR] Failed to execute goal on project hive-pre-upgrade: Could not
> > resolve dependencies for project
> > org.apache.hive:hive-pre-upgrade:jar:4.0.0-alpha-2-SNAPSHOT: Could not
> find
> > artifact jdk.tools:jdk.tools:jar:1.7 at specified path
> >
> /Users/asolimando/.sdkman/candidates/java/11.0.11.hs-adpt/../lib/tools.jar
> > -> [Help 1]
> >
> > The issue is located here:
> >
> >
> https://github.com/apache/hive/blob/master/upgrade-acid/pre-upgrade/pom.xml#L52-L75
> >
> > Adding an exclusion on jdk.tools as follows fixes the problem:
> > 
> >   jdk.tools
> >   jdk.tools
> > 
> >
> > I guess it's safe to add this exclusion, since the of the dependency
> scope
> > is "provided" (meaning that the dependency is expected to be in the
> > classpath already at runtime, so the exclusion won't interfere with that,
> > nothing is packaged differently from Hive due to the exclusion), and both
> > compilation under JDK8 and the run of the full test suite in CI were OK.
> >
> > Do you guys see any problem with this approach?
> >
> > Before this solution, I have tried to add the "skip.sonar" maven property
> > (as per
> > https://docs.sonarqube.org/latest/analysis/scan/sonarscanner-for-maven/)
> > but it is ignored.
> >
> > Another approach would have been to exclude the submodule from sonar
> > analysis using maven reactor, but I can't seem to find a name of the
> > module, "upgrade-acid" is excluded (but the submodule mentioned here
> still
> > gets processed and fails), but "pre-upgrade" does not and fails as
> follows:
> >
> > $ mvn org.sonarsource.scanner.maven:sonar-maven-plugin:3.9.0.2155:sonar \
> >  -DskipTests -Dit.skipTests -Dmaven.javadoc.skip -pl '!pre-upgrade'
> > [INFO] Scanning for projects...
> > [ERROR] [ERROR] Could not find the selected project in the reactor:
> > pre-upgrade @
> > [ERROR] Could not find the selected project in the reactor: pre-upgrade
> ->
> > [Help 1]
> > [ERROR]
> > [ERROR] To see the full stack trace of the errors, re-run Maven with the
> > -e switch.
> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> > [ERROR]
> > [ERROR] For more information about the errors and possible solutions,
> > please read the following articles:
> > [ERROR] [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/MavenExecutionException
> >
> > Best regards,
> > Alessandro
> >
>


Re: [VOTE] Apache Hive 3.1.3 Release Candidate 3

2022-05-16 Thread Stamatis Zampetakis
Hi all,

In case you missed it the release notes for Hive 3.1.3 are broken [1].

To avoid similar problems in the future please remember to:
* associate commits with JIRA tickets;
* fill in the appropriate version in the "Fix version" field when
committing;
* mark the JIRA ticket as resolved;

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26214

On Sat, Apr 9, 2022 at 9:09 AM Naveen Gangam 
wrote:

> Thank you all the voting. Appreciate it.
>
> I have 4 binding +1 votes and no negative votes. Have just completed the
> remaining release work. Will announce shortly.
>
>
> Naveen
>
> On Fri, Apr 8, 2022 at 3:32 AM Stamatis Zampetakis 
> wrote:
>
> > Ubuntu 20.04.4 LTS, jdk1.8.0_261, Apache Maven 3.6.3
> >
> >  * Checked signatures and checksums OK
> >  * Checked for checkstyle modified LGPL files OK
> >  * Checked for illegal licenses in release binaries (jars) using [1] OK
> >  * Checked diff between repo and release sources (diff -qr hive
> > apache-hive-3.1.3-src) OK
> >  * Built from git tag (mvn clean install -DskipTests -Pitests -Pjavadoc)
> OK
> >  * Built from release sources (mvn clean install -DskipTests -Pitests
> > -Pjavadoc) OK
> >  * Run smoke tests in hive-dev-box using hadoop 3.1.0 and tez 0.9.1 OK
> >
> > - Initialized derby metastore
> > - Simple CREATE, INSERT, ANALYZE queries
> > - Simple SPJA queries
> > - EXPLAIN variations
> >
> > +1 (non-binding)
> >
> > Best,
> > Stamatis
> >
> > [1] for f in `find . -name "*.jar"`; do echo $f; jar xf $f
> > META-INF/LICENSE; head -5 META-INF/*; done >> ALL_LICENSES
> >
> > On Thu, Apr 7, 2022 at 6:43 PM Chao Sun  wrote:
> >
> > > +1 (binding)
> > >
> > > - verified the signatures and checksums
> > > - tried the binary and tested a few queries.
> > > - built from source
> > >
> > > Thanks Naveen!
> > >
> > > Best,
> > > Chao
> > >
> > >
> > > On Thu, Apr 7, 2022 at 1:28 AM Peter Vary 
> > > wrote:
> > > >
> > > > Downloaded the 3.1.3 artifacts, and checked the signatures. They are
> > OK.
> > > > Used the binary to run some basic tests, and it seems OK.
> > > >
> > > > +1 (binding)
> > > >
> > > > > On 2022. Apr 6., at 20:32, Szehon Ho 
> > wrote:
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > Downloaded and ran create , insert, simple query on postgres.
> > > > > Verified checksums.
> > > > > Built from source.
> > > > >
> > > > > Thanks,
> > > > > Szehon
> > > > >
> > > > > On Mon, Apr 4, 2022 at 7:56 AM Naveen Gangam
> > > 
> > > > > wrote:
> > > > >
> > > > >> *[No new commits from RC2]. Just cleaned up
> > > **apache-hive-3.1.3-src.tar.gz*
> > > > >> *archive*
> > > > >>
> > > > >>
> > > > >> Apache Hive 3.1.3 Release Candidate 3 is available here:
> > > > >> https://people.apache.org/~ngangam/apache-hive-3.1.3-rc-3
> > > > >>
> > > > >> The checksums are these:
> > > > >>
> > > > >>
> > > > >> - 0c9b6a6359a7341b6029cc9347435ee7b379f93846f779d710b13f795b54bb16
> > > > >> apache-hive-3.1.3-bin.tar.gz
> > > > >>
> > > > >>
> > > > >> - b5e17f664afbb5ac702f0de0a31363caf58e067b19229df63da01c38430f6fcc
> > > > >> apache-hive-3.1.3-src.tar.gz
> > > > >>
> > > > >>
> > > > >> Maven artifacts are available here:
> > > > >>
> > https://repository.apache.org/content/repositories/orgapachehive-1116
> > > > >>
> > > > >>
> > > > >> The tag release-3.1.3-rc3 has been applied to the source for this
> > > > >> release in github, you can see it at
> > > > >>
> > > > >> https://github.com/apache/hive/tree/release-3.1.3-rc2
> > > > >>
> > > > >> The git commit hash is: 4df4d75bf1e16fe0af75aad0b4179c34c07fc975
> > > > >> <
> > > > >>
> > >
> >
> https://github.com/apache/hive/commit/4df4d75bf1e16fe0af75aad0b4179c34c07fc975
> > > > >>>
> > > > >> Voting will conclude in 72 hours.
> > > > >>
> > > > >> Hive PMC Members: Please test and vote.
> > > > >>
> > > > >> Thanks.
> > > > >>
> > > >
> > >
> >
>


Re: Release candence

2022-06-09 Thread Stamatis Zampetakis
Thanks Peter and Zoltan for starting this discussion. Apologies for the
delay but I had the impression that I already sent this email.

I definitely agree with the idea of having quarterly releases no matter
what they are called (alpha, beta, or other).

I wouldn't base the decision to move from alpha to beta so much on features
like Iceberg or JDK11 but rather on items that show the stability of the
release. Generally, I would be confident to move from alpha to beta if we:
* deploy Hive on a realistic setting and go over few use-cases
* don't identify serious regressions in the last X months

Setting up Hive on a cluster of 5-10 machines and running successfully all
TPC-DS queries over 10GB-100GB would be enough for me to say that we have a
realistic deployment.
Others would probably have different expectations/use-cases so we should
agree on the bare minimum that we would like to have since there is no way
to cover everything.

We released the alpha-1 version so that people can try it out and give
feedback about it. I really hope users take the time to test the recent
releases.
If for a certain period of time we see that there are no
serious regressions then we can move gradually from alpha to beta.
We can discuss the exact amount of time that we want to wait but I would
say that if we get another alpha release out (alpha-2) and people do not
report any serious problems we could move to beta-1.

For moving from beta to stable, I would follow the same scheme; stable
deployment and no important regressions for a reasonable amount of time.

It would be nice if in the next few releases we clarify how packaging
bundles are supposed to be for Hive, metastore, storage-api, etc to avoid
confusion for end-users [1].
I was also gonna mention that it would be good to have javadoc for the next
release [2] but just realized that this is already fixed :) (thanks Peter,
and Zoltan!)

Regarding the discussion about the exec jar I more or less share the same
opinion with Zoltan but we can continue the discussion under the respective
JIRA [3].

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26218
[2] https://issues.apache.org/jira/browse/HIVE-26092
[3] https://issues.apache.org/jira/browse/HIVE-26220

On Wed, May 11, 2022 at 9:05 AM Zoltan Haindrich  wrote:

> Hey,
>
>
>  >> In another email thread (
> https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun
> Chao mentioned that  other projects (Spark,
>  >> Iceberg and Trino/Presto) are still depending on old Hive, because the
> exec-core jar has been removed, and the exec jar contains unshaded versions
> of various
> dependencies. Until this is fixed, they can not upgrade to a newer version
> of Hive, so I would like to add this as a blocker for Hive 4.0.0 release.
>
>  >> @Chao Sun: Could you help us find the jira for this issue, or file a
> new one?
>
> I was thinking about this and I think this is a bit unfair...say project X
> is using Hive 2.3's core jar; should "we" the Hive community do all the
> work to run their project
> with Hive 4? I don't think so.
> What if some project is not interested in upgrading? Should we really put
> efforts into thing even in that case?
>
> The best middle ground idea I was able to come up so far was to ask for a
> broken development branch set up to run with some 4.0.0-alpha-X release
> where we can start fixing
> the shading issues they might face together.
> In this case they will be already ready to go upgrading their Hive; and if
> they also able to run tests/etc: as a bonus we will get early
> pre-integration feedback(s)...which
> will be valuable for both them and us.
>
> What do you guys think?
> Are there any other options?
>
> cheers,
> Zoltan
>
> On 5/11/22 7:33 AM, Chao Sun wrote:
> > Thanks for reminding me, Peter. There is
> > https://issues.apache.org/jira/browse/HIVE-25317 but that's for Hive
> > 2.3 and is mostly for the Spark use case. I just created
> > https://issues.apache.org/jira/browse/HIVE-26220 and marked it as a
> > blocker.
> >
> > On Tue, May 10, 2022 at 10:01 PM Peter Vary  wrote:
> >>
> >> In another email thread (
> https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun
> Chao mentioned that  other projects (Spark,
> >> Iceberg and Trino/Presto) are still depending on old Hive, because the
> exec-core jar has been removed, and the exec jar contains unshaded versions
> of various dependencies. Until this is fixed, they can not upgrade to a
> newer version of Hive, so I would like to add this as a blocker for Hive
> 4.0.0 release.
> >>
> >> @Chao Sun: Could you help us find the jira for this issue, or file a
> new one?
> >>
> >> Any more blockers?
> >>
> >> Thanks,
> >> Peter
> >>
> >> On Fri, Apr 29, 2022, 13:46 Peter Vary  wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> With Zoltan Haindrich, we have been brainstorming about the next steps
> after the 4.0.0-alpha-1 release.
> >>>
> >>> We come up with the following plan:
> >>> - Define a desired scope for the 4.0.

Re: [DISCUSS] Remove Druid dependency from Hive

2022-06-17 Thread Stamatis Zampetakis
Hi Simhadri,

Thanks for starting this discussion Simhadri.

I am cc'ing the user list as well so that we have a better idea if there
are any active users.

Personally I am not that familiar with the Druid module.
* Is it currently broken?
* Do we have active tests?
* Does it need significant effort to update the Druid version?

Best,
Stamatis

On Thu, Jun 16, 2022 at 10:33 AM SG  wrote:

> Hello Everyone,The last commits related to druid were around early
> 2020[1]Since
> then the version of Druid used by hive has remained the same 0.17.1[2]Druid
> version 0.17.1 has a significant number of CVEs
> 
> associated
> with it and some of which allow remote code execution.If no one is
> maintaining it or plan to do so in near future, Can we remove it from our
> code?Thoughts?-Simhadri[1]
>
> https://github.com/apache/hive/search?o=desc&q=druid&s=committer-date&type=commits
> [2]
>
> https://github.com/apache/hive/blob/0033675057a60d0a05a252854455e2b8835e89cc/pom.xml#L127
>


Re: [DISCUSS] Spellcheck on CI

2022-06-17 Thread Stamatis Zampetakis
It's the first time that I see a spellchecker in action in the projects
that I contribute to so I can't formulate an opinion before I use it for
sometime.

Activating in every module or removing altogether may be a bit premature at
this stage.
I would suggest keeping things as they are right now and revisit the
situation in the near future.
If others want to act now that is fine with me as well.

Best,
Stamatis

On Wed, Jun 15, 2022 at 11:32 AM Ayush Saxena  wrote:

> Doesn’t make sense to me untill and unless that typo can create any major
> impact, like it is there in the docs, in the name of metrics/UDF or system
> tables or at places where correcting the name might become an incompatible
> change.
> Checking the except.txt it has entries like dfs, Hdfs and so as well.
> If we try to cover up other FileSystems and all it would have
> ofs,o3fs,s3fs,abfs and many more cases in that case this except.txt will
> become huge and very tough to manage.
> Secondly, it has some strange entries as well, decoding their meaning
> isn’t very possible for everyone..
>
> Not blocking, but we shouldn’t in general consdier extending this to whole
> project may be just pick and choose the relevant places/packages only if we
> plan to increase the coverage
>
> -Ayush
>
> > On 15-Jun-2022, at 2:35 PM, Peter Vary 
> wrote:
> >
> > Hi Team,
> >
> > I have seen that the CI test are continuously failing with the following
> errors:
> >
> > Check warning on line 18 in
> serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java
> > GitHub Actions
> > / Spell checking
> >
> serde/src/java/org/apache/hadoop/hive/serde2/esriJson/deserializer/SpatialReferenceJsonDeserializer.java#L18
> > `esri` is not a recognized word.
> >
> > Sourabh Badhya kindly helped me identifying the commit which caused this
> issue:
> >
> https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7
> <
> https://github.com/apache/hive/commit/0099b14aa6a50d4470b057e93a95a7391b74add7
> >
> >
> > The jira for the change is:
> > https://issues.apache.org/jira/browse/HIVE-25733 <
> https://issues.apache.org/jira/browse/HIVE-25733>
> >
> > To fix the issues I pushed an addendum:
> >
> https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45
> <
> https://github.com/apache/hive/commit/0b4e466866fe07a160b0e4b0c27d2b3fb7613c45
> >
> >
> > Hopefully after a rebase all of the tests should be green now (the first
> tests are running now).
> >
> >
> > The new tests add a spellcheck for the CI. Currently it runs only for
> the `serde` package. The check parses the code files, and if it founds some
> unrecognisable / non-dictionary words, then it throws an error.
> >
> > This could be useful for catching spelling errors not recognised by the
> reviewers, but keeping the exception list (in
> .github/actions/spelling/expect.txt) might become cumbersome. Currently we
> have 452 entries in the file for the `serde` package alone.
> >
> > I would like to hear the community’s opinion whether we would like to
> keep it and use it for other modules as well.
> >
> > Thanks,
> > Peter
>


Re: [DISCUSS] Hive EOL question

2022-06-20 Thread Stamatis Zampetakis
Hi Guangming,

There was a recent discussion about EOL Hive releases [1] but it was not
conclusive.

Feel free to reopen that thread if you have some thoughts on the subject.

Best,
Stamatis

[1] https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s

On Sun, Jun 19, 2022 at 11:20 AM Guangming Lu  wrote:

> Hi, who knows the EOL schedule for each  Hive release? For example, when
> will 3.1.0 EOL be implemented.
>
> Best,
> Guangming
>
>
>
>
>
>


Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-07-25 Thread Stamatis Zampetakis
Hi all,

In the last exchanges there was a general consensus to EOL Hive 1.X but no
additional action.
I believe the next step would be to start a VOTE and move forward with an
official announcement.

I think it would be helpful for the end-users to know which releases are
supported and which are strongly discouraged.
The Hadoop community keeps this information in their wiki [1].

Although, I am still not convinced that we should encourage users to use
the older release lines (2.X, 3.X) we can postpone the decision for the
time being and proceed just for 1.X.

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

On Tue, May 10, 2022 at 2:51 PM Stamatis Zampetakis 
wrote:

> Thanks everyone for sharing your thoughts. I am happy to see so many
> people involved in the discussion.
>
> I would say that the current 4.0.0-alpha-1 is better in many aspects than
> previous stable releases, although this might be a bit subjective.
>
> I am afraid that if we keep supporting older releases it will take too
> much time till people start using the 4.x.
> Having real deployments of Hive 4 is the only way to go from alpha to
> stable releases with confidence.
>
> I checked the download statistics for Hive releases [1], [2] for the past
> month and the results show that the vast majority of downloads are for
> older releases.
> I am not posting the stats here since I am not sure if this would violate
> some policies. Hive committers can access the stats using their ASF
> credentials.
> To some degree this is expected but at the same time problematic given the
> number of open issues which affect older releases.
>
> I would definitely like to have multiple maintenance branches with high
> quality standards but I don't think there are enough active committers in
> the project to successfully maintain those.
> The https://github.com/mr3project/hive-mr3 repo may be a great fit for an
> upcoming ASF Hive release.
> However, according to what Sungwoo said, this seems more like a new
> maintenance branch rather than a continuation of Hive 3.
> Moving towards this direction would certainly require more time from all
> of us.
>
> Lastly, it seems that there are some issues preventing people from using
> 4.0.0-alpha-1.
> As Peter already mentioned these issues are probably release blockers and
> it should be taken into account in the next Hive 4 release.
> The thread about the next steps after 4.0.0-alpha-1 [3] is the perfect
> place to discuss those.
> For those with certain demands around Hive 4, please reply to [3] and
> include any specific JIRAs that need to be in the scope of the next release.
>
> Best,
> Stamatis
>
> [1] https://logging1-he-de.apache.org/stats/
> [2] https://repository.apache.org/#central-stat
> [3] https://lists.apache.org/thread/n245dd23kb2v3qrrfp280w3pto89khxj
>
>
> On Tue, May 10, 2022 at 10:55 AM Sungwoo Park  wrote:
>
>> We maintain our own fork of Hive 3 because we are not always adding new
>> commits to the tip of the branch. To backport a new patch, sometimes we
>> have to add new commits between existing commits, update earlier commits,
>> and so on. This makes it impractical to keep adding new patches only to the
>> tip of the branch while reverting commits if necessary. Maintaining the
>> Hive 3 branch would mean frequent force-updates, which might produce more
>> problems. (If this is not an issue, we could try to completely rebuild the
>> Hive 3 branch.)
>>
>> I hope the Apache community can make a concerted effort to figure out
>> what patches to include in Hive 3. For us, the challenge was 1) to decide
>> which patch to include; 2) to figure out its dependencies if any; 3) to
>> resolve conflicts. Testing was also another source of pain.
>>
>> Thanks,
>>
>> --- Sungwoo
>>
>>
>>
>>
>>
>> On Tue, May 10, 2022 at 4:26 PM Peter Vary  wrote:
>>
>>> When we were brainstorming about the future of the Hive 3 branch with
>>> Zoltan Haindrich, he mentioned this letter:
>>> https://lists.apache.org/thread/by9ppc2z8oqdzpqotzv5bs34yrxrd84l
>>>
>>> I think Sungwoo Park and his team makes a huge effort to maintain this
>>> branch, and maybe it would be better to help them do this inside the Apache
>>> Hive project. They should not need to maintain their own branch if there is
>>> no particular reason behind it, or we can remove those blockers. This could
>>> be beneficial for every Hive user who still uses Hive 3.
>>>
>>> @Sungwoo: Do you have any specific reason to keep you own fork of Hive 3?
>>>
>>> That would mean we could have a much bett

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-09 Thread Stamatis Zampetakis
Hi Alessandro,

Sonar integration will definitely help in improving cope quality and
preventing bugs so many thanks for pushing this forward.

I went over the PR and it is in good shape. I plan to merge it in the
following days unless someone objects.
We can tackle further improvements in follow up JIRAs.

Is it possible to somehow save the current analysis on master and make the
PR quality gates fail when things become worse?
If not then what may help in reviewing PRs is to have a diff view (between
a PR and current master) so we can quickly tell if the PR we are about to
merge makes things better or worse; as far as I understand the idea is to
do this manually at the moment by checking the results on master and on the
PR under review.

Enabling code coverage would be very helpful as well. Looking forward to
this.

Best,
Stamatis

On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Errata corrige: the right PR link is the following
> https://github.com/apache/hive/pull/3254
>
> Best regards,
> Alessandro
>
> On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > Hi community,
> > in the context of HIVE-26196
> >  we started
> considering
> > the adoption of SonarCloud  analysis for
> > Apache Hive to promote data-driven code quality improvements and to allow
> > reviewers to focus on the conceptual part of the changes by helping them
> > spot trivial code smells, security issues and bugs.
> >
> > SonarCloud has already been adopted and integrated into a few top Apache
> > projects like DolphinScheduler 
> and Apache
> > Jackrabbit FileVault .
> >
> > For those who don't know, Sonar is a code analysis tool, the initial
> > adoption would aim at tracking code quality for the master branch, and
> > making the PRs' review process easier, by allowing to compare which
> > code/security issues a PR solved/introduced with respect to the main
> branch.
> >
> > We already have a Hive-dedicated project under the Apache foundation's
> > SonarCloud account:
> https://sonarcloud.io/project/overview?id=apache_hive.
> >
> > In what follows I will highlight the main points of interest:
> >
> > 1) sonar adoption scope:
> > For the time being a descriptive approach (just show the analysis and
> > associated metrics) could be adopted, delaying a prescriptive one (i.e.,
> > quality gates based on the metrics for PRs' mergeability) to a later time
> > where we have tested SonarCloud for long enough to judge that it could
> be a
> > sensible move.
> >
> > 2) false positives:
> > Sonar suffers from false positives, but they can be marked as such from
> > the web UI: (source https://docs.sonarqube.org/latest/faq/#header-1)
> >
> > How do I get rid of issues that are False-Positives?
> >> False-Positive and Won't Fix
> >> You can mark individual issues False Positive or Won't Fix through the
> >> issues interface. If you're using PR analysis provided by the Developer
> >> Edition, issues marked False Positive or Won't Fix will retain that
> status
> >> after merge. This is the preferred approach.
> >
> >
> >> //NOSONAR
> >> For most languages, SonarQube supports the use of the generic mechanism:
> >> //NOSONAR at the end of the line of the issue. This will suppress all
> >> issues - now and in the future - that might be raised on the line.
> >
> >
> > For the time being, I think that marking false positives via the UI is
> > more convenient than using "//NOSONAR", but this can be discussed
> further.
> >
> > 3) test code coverage:
> >
> > Due to the specific structure of the ptest infra (split execution and
> > other peculiarities), we are not yet supporting test code coverage, this
> > can be added at a later stage, in the meantime all the code quality and
> > security metrics are available.
> >
> > 4) what will be analyzed:
> >
> > the master branch and each open PR
> >
> > 5) integration with github:
> >
> > SonarCloud integrates with GitHub in two ways, the first one is an
> > additional item in the list of checks (where you have the spell checking,
> > CI result etc.) that will just say Passed/Not Passed and provide a link
> for
> > all the details, the second is a "summary" comment under the PR
> > highlighting the main info (you can see an example here
> > ).
> >
> > The second integration can be disabled if we consider that the first one
> > is enough, and that if we want to dig more we can open the associated
> link
> > for the full analysis in SonarCloud.
> >
> > 6) analysis runtime:
> >
> > In CI the full analysis takes around 30 minutes, but this step is
> executed
> > in parallel with the test split tasks and won't add to the total runtime.
> > For PRs SonarCloud detects unchanged files and avoids analysing 

Re: [DISCUSS] SonarCloud integration for Apache Hive

2022-08-10 Thread Stamatis Zampetakis
That's great news! From the initial message, I got the impression that the
Sonar label in the PR will report all problems currently in master (and not
only the new ones).

I agree, it is better not to enforce quality gates directly but leave some
time for the rest of us to get familiar with the tool.

Best,
Stamatis

On Tue, Aug 9, 2022 at 6:04 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> Hi Stamatis,
> glad to hear you find Sonar helpful, thanks for providing your feedback.
>
> The master branch analysis already provides what I think you are looking
> for, you have:
>
>- all code analysis (to see the full status of the code):
>https://sonarcloud.io/summary/overall?id=apache_hive
>- new code analysis (basically what changed in the last commit):
>https://sonarcloud.io/summary/new_code?id=apache_hive
>
> For PRs, similarly, the analysis covers the changes w.r.t. the target
> branch, it's a good and quick way to ascertain the code quality of the PR.
>
> Regarding "Is it possible to somehow save the current analysis on master
> and make the
> PR quality gates fail when things become worse?", it is definitely
> possible, we can define a success/failure threshold for each of the
> metrics, and make it fail if the quality gate criteria are not met.
>
> I was suggesting to postpone this to allow people to get first familiar
> with it, I would not want to disrupt existing work, Sonar is a rich tool
> and people might need a bit of time to adjust to it.
>
> Good news is that quality gates can be changed directly from SonarCloud and
> won't require code changes, we might kick in a feedback discussion after a
> month or so from when we introduce Sonar analysis and see what people
> think.
>
> Best regards,
> Alessandro
>
> On Tue, 9 Aug 2022 at 16:38, Stamatis Zampetakis 
> wrote:
>
> > Hi Alessandro,
> >
> > Sonar integration will definitely help in improving cope quality and
> > preventing bugs so many thanks for pushing this forward.
> >
> > I went over the PR and it is in good shape. I plan to merge it in the
> > following days unless someone objects.
> > We can tackle further improvements in follow up JIRAs.
> >
> > Is it possible to somehow save the current analysis on master and make
> the
> > PR quality gates fail when things become worse?
> > If not then what may help in reviewing PRs is to have a diff view
> (between
> > a PR and current master) so we can quickly tell if the PR we are about to
> > merge makes things better or worse; as far as I understand the idea is to
> > do this manually at the moment by checking the results on master and on
> the
> > PR under review.
> >
> > Enabling code coverage would be very helpful as well. Looking forward to
> > this.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Aug 8, 2022 at 1:22 PM Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > Errata corrige: the right PR link is the following
> > > https://github.com/apache/hive/pull/3254
> > >
> > > Best regards,
> > > Alessandro
> > >
> > > On Mon, 8 Aug 2022 at 10:04, Alessandro Solimando <
> > > alessandro.solima...@gmail.com> wrote:
> > >
> > > > Hi community,
> > > > in the context of HIVE-26196
> > > > <https://issues.apache.org/jira/browse/HIVE-26196> we started
> > > considering
> > > > the adoption of SonarCloud <https://sonarcloud.io/features> analysis
> > for
> > > > Apache Hive to promote data-driven code quality improvements and to
> > allow
> > > > reviewers to focus on the conceptual part of the changes by helping
> > them
> > > > spot trivial code smells, security issues and bugs.
> > > >
> > > > SonarCloud has already been adopted and integrated into a few top
> > Apache
> > > > projects like DolphinScheduler <https://dolphinscheduler.apache.org/
> >
> > > and Apache
> > > > Jackrabbit FileVault <https://jackrabbit.apache.org/filevault/>.
> > > >
> > > > For those who don't know, Sonar is a code analysis tool, the initial
> > > > adoption would aim at tracking code quality for the master branch,
> and
> > > > making the PRs' review process easier, by allowing to compare which
> > > > code/security issues a PR solved/introduced with respect to the main
> > > branch.
> > > >
> > > > We already have a Hive-dedicated project under the Apache
> foundation

Re: gRPC Support in Hive Metastore

2022-08-24 Thread Stamatis Zampetakis
Hi Rohan and team,

The work sounds exciting thanks for considering contributing back to the
community.

The design document didn't arrive cause attachements are not allowed in
many Apache lists.

Maybe as a first step it would be nice to share a link to a Google doc
where people can add comments and possibly provide some feedback on it.
Then, I guess it makes sense to put in the wiki
under the respective section (design documents) and/or upload it in the
JIRA case.

I am not very familiar with the area so not sure if I can help much pushing
this forward but I am definitely interested to learn more about this work.

Best,
Stamatis

On Tue, Aug 23, 2022, 2:05 AM Cameron Moberg 
wrote:

>
> *Sending on behalf of Rohan where policies don't allow sending outside of
> our domain for interns:*
> Hello -
>
> During my internship I’ve been working on gRPC native support in the
> standalone hive metastore as it comes with a variety of benefits. As a
> proof of concept, my team, Dataproc Metastore on GCP currently uses a
> client side proxy to translate Thrift requests to gRPC coupled with a
> server side proxy to translate the gRPC requests back to Thrift. The
> process is repeated in reverse to deliver the server response to the
> client. While this approach has been successful, native gRPC support has
> several cloud-centric advantages over the current configuration:
>
>- enables streaming support
>- allows for native integrations in Hive ecosystem for various query
>engines like Impala, Spark SQL, and Trino to take advantage of streaming
>(eventually)
>- has support for custom interceptors for more fine-grained control
>over the server action
>- built on HTTP/2 protocol
>
> I’ve opened a PR here  (just
> fyi, no rush), some background – this proto3 definition has been refactored
> to take a MethodNameRequest and MethodNameResponse to stop any future
> backwards incompatibilities. Unfortunately, the other metastore.proto which
> has SplitInfo uses a `required` field setting, which makes upgrading it not
> feasible since moving away from `required` will change the SerDe of proto,
> potentially a breaking change depending on clients.
> While this is the last week of my internship my hosts cjmob...@google.com
>  and hchinch...@google.com will continue to develop in this area with
> further implementation building on the proto.
>
> Attached is the full design doc, I’m not sure how I’m supposed to share
> documents like this, so I can reupload this somewhere or convert to the
> wiki.
>
> Comments are of course appreciated!
>
> Thank you,
> Rohan Sonecha
>


Re: Proposal: Revamp Apache Hive website.

2022-09-15 Thread Stamatis Zampetakis
Hi all,

It's great to see some effort in improving the website. The POC from
Simhadri looks really cool; I didn't check the content but I love the look
and feel.

Now regarding the current process for modifying and updating the website
there is some info in this relatively recent thread [1].

Moving forward, I would really like to have the source code of the website
(markdown etc) in the main repo of the project [2], and use GitHub actions
to automatically build and push the content to the site repo [3] per commit
basis.
This workflow is used in Apache Calcite and I find it extremely convenient.

Best,
Stamatis

[1] https://lists.apache.org/thread/4b6x4d6z4tgnv4mo0ycg30y4dlt0msbd
[2] https://github.com/apache/hive
[3] https://github.com/apache/hive-site

On Thu, Sep 15, 2022 at 10:50 PM Ayush Saxena  wrote:

> Owen,
> I am not sure if I am catching you right, But now the repository for the
> website has changed, we no longer use our main *hive.git* repository for
> the website, We are using the* hive-site *repository for the website, The
> migration happened this year January I suppose.
>
> Can give a check to the set of commit here from: gmcdonald
>  and
> Humbedooh 
> https://github.com/apache/hive-site/commits/main
>
> Now whatever you push to main branch of hive-site(
> https://github.com/apache/hive-site) it gets published on the *asf-site*
> branch by the buildbot(
> https://github.com/apache/hive-site/commits/asf-site)
>
> Simhadri's changes will be directed to the main branch of the hive-site
> repo and they will get auto published on the asf-site branch, I tried this
> a couple of months back and it indeed worked that way. Let me know if we
> are missing anything on this, I tried to find threads around this but not
> sure if it is in private@ or so, couldn't find, I will try again and if
> there is something around that what needs to be done, I will have a word
> with the Infra folks and get that sorted, if it isn't already.
>
> -Ayush
>
> On Fri, 16 Sept 2022 at 01:49, Owen O'Malley 
> wrote:
>
>> Look at the threads and talk to Apache Infra. They couldn't make it work
>> before. We would have needed to manually publish to the asf-site branch.
>>
>> On Thu, Sep 15, 2022 at 7:54 PM Simhadri G  wrote:
>>
>>> Thanks Ayush, Pau Tallada and Owen O'Malley for the feedback!
>>>
>>> @Owen , This website revamp indeed replaces the website with markdown as
>>> you have mentioned. I have referred to your PR for some of the content for
>>> the site.
>>> The actual code for the website is here:
>>> https://github.com/simhadri-g/hive-site/tree/new-site
>>>
>>> Once we add markdown files to the source code under /content/ , hugo
>>> will rebuild the files and generate the static html files in ./public/
>>> directory.
>>> I have copied over these static files to a separate repo and temporarily
>>> hosted it with gh-pages to start the mail chain.
>>>
>>>  For the final site, I am already trying to automate this with github
>>> actions. So, as soon as any new changes are made to the site branch, the
>>> github actions will automatically tigger and update the site.
>>>
>>> Thanks!
>>>
>>> On Fri, Sep 16, 2022 at 12:17 AM Owen O'Malley 
>>> wrote:
>>>
 I found it - https://github.com/apache/hive/pull/1410

 On Thu, Sep 15, 2022 at 6:42 PM Owen O'Malley 
 wrote:

> I had a PR to replace the website with markdown. Apache Infra was
> supposed to make it autopublish. *sigh*
>
> .. Owen
>
> On Thu, Sep 15, 2022 at 4:23 PM Pau Tallada  wrote:
>
>> Hi,
>>
>> Great work!
>> +1 on updating it as well
>>
>> Missatge de Ayush Saxena  del dia dj., 15 de
>> set. 2022 a les 17:40:
>>
>>> Hi Simhadri,
>>> Thanx for the initiative, +1 on updating our current website.
>>> The new website looks way better than the existing one.
>>> Can create a Jira and link this to that after a couple of days if
>>> there aren’t any objections to the move, so as people can drop further
>>> suggestions over there.
>>>
>>> -Ayush
>>>
>>> > On 15-Sep-2022, at 8:33 PM, SG  wrote:
>>> >
>>> > Hi Everyone,
>>> >
>>> > The existing apache hive website https://hive.apache.org/ hasn't
>>> been
>>> > updated for a very long time. Additionally, I was not able to
>>> build the
>>> > docker image associated with the site to test out new changes as
>>> well.
>>> > https://github.com/apache/hive-site
>>> >
>>> > Since the website is the front page of the project, I believe it
>>> would be
>>> > good to revamp the apache hive website with the latest features and
>>> > releases.
>>> >
>>> > As a result, I have spent some time setting up an initial draft of
>>> the
>>> > website. There are still quite a few things that still need to be
>>> > added

Re: Proposal: Revamp Apache Hive website.

2022-09-21 Thread Stamatis Zampetakis
The javadocs are currently in svn and they can remain there for the moment.
Eventually, they could be moved to a hive-site repository and for sure we
don't want them in the main hive repo. I don't see an immediate need to
change the place where javadocs are stored but if needed we can raise a
JIRA ticket and continue the discussion there. It's not a good idea to
discuss under a closed issue/PR.

The hive-site repo is always gonna be the place for storing the generated
website (html files etc). When you talk about moving back to the hive repo
I guess you refer to the source/markdown files. The decision to change the
process of publishing the website will probably require a PMC vote with
lazy consensus.

I agree that we can start by updating the current setup. Then we can kick
off the discussion about moving the website sources to hive repo and start
publishing from there. I don't know if we need to move the javadocs, so we
can postpone this discussion till we hit an obstacle.

Best,
Stamatis

On Mon, Sep 19, 2022 at 12:01 PM Simhadri G  wrote:

> Thanks Owen, Stamatis, Ayush and Alessandro for the feedback.
>
>- Regarding the javadocs and the automatically build and to deploy
>github-pages discussion in the previous PR thread [1]
><https://github.com/apache/hive/pull/1410>,
>
>
>- Apache Iceberg-docs ([2] <https://iceberg.apache.org/javadoc/latest/>)
>   has recently set up a github workflow ([3])
>   
> <https://github.com/apache/iceberg-docs/actions/runs/3062679467/jobs/4943928455>
>   to publish the javadocs from a given javadocs dir [4]
>   <https://github.com/apache/iceberg-docs/tree/main/javadoc> , I
>   think we can setup the same workflow for Hive javadocs.
>   - As Ayush and Stamatis have mentioned, I think over the past 2
>   years, apache infra has added support for github actions and we can 
> confirm
>   that from Apache Iceberg/calcite docs that are currently using it.
>   - But I am not sure regarding which branch or directory we will
>   need to put the hive javadoc files . This needs more discussion and we 
> can
>   follow up on this([5]
>   <https://github.com/apache/hive/pull/1410#issuecomment-680111530>)
>   .
>
>
>-  I am not aware about the procedure or the approvals we need to move
>from hive-site repo back to the main repository. We will need help with
>this.
>
>- I was able to setup the github action on the POC repo:
>https://github.com/simhadri-g/hive-site/tree/new-site  .
>- Any changes to this repo/new-site will automatically reflect here
>   once the github workflow completes:
>   https://simhadri-g.github.io/hive-site/  .
>
>   - Considering the feedback, I think we can plan to do in 3 phases,
>for the first cut I would like to update the website in the present setup,
>followed by moving the javadocs to the hive-site repo  and as for the third
>phase , we can work on migrating from hive-site to hive repo.
>
>- If everyone agrees, can we please go ahead with the first phase?
>
>
> [1]https://github.com/apache/hive/pull/1410,
> [2]https://iceberg.apache.org/javadoc/latest/
> [3]
> https://github.com/apache/iceberg-docs/actions/runs/3062679467/jobs/4943928455
> [4]https://github.com/apache/iceberg-docs/tree/main/javadoc
> [5]https://github.com/apache/hive/pull/1410#issuecomment-680111530
> [6] https://github.com/apache/hive/pull/1410#issuecomment-680102815
>
>
> Thanks!
> Simhadri G
>
> On Mon, Sep 19, 2022 at 1:50 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
>> Hi everyone,
>> thanks Simhadri for pushing this forward.
>>
>> I like the look and feel of the new website, and I agree with Stamatis
>> that having the website sources in the Hive repo, and automatically
>> publishing the site upon commits would be very beneficial.
>>
>> Best regards,
>> Alessandro
>>
>> On Thu, 15 Sept 2022 at 23:11, Stamatis Zampetakis 
>> wrote:
>>
>>> Hi all,
>>>
>>> It's great to see some effort in improving the website. The POC from
>>> Simhadri looks really cool; I didn't check the content but I love the look
>>> and feel.
>>>
>>> Now regarding the current process for modifying and updating the website
>>> there is some info in this relatively recent thread [1].
>>>
>>> Moving forward, I would really like to have the source code of the
>>> website (markdown etc) in the main repo of the project [2], and use GitHub
>>> actions to automatically build and push the content to the site repo [3]
>>> per commit basis.
>>

Re: Proposal: Revamp Apache Hive website.

2022-10-05 Thread Stamatis Zampetakis
Thanks for staying on top of this Simhadri.

I will try to help reviewing the PR once I get some time.

What is not yet clear to me from this discussion or by looking at the PR is
the workflow for making a change appear on the web (https://hive.apache.org/).
Having a README which clearly states what needs to be done is a must.

I also think it is quite important to have instructions and possibly docker
images for someone to be able to test how the changes look locally before
commiting a change to the repo.

Another point that needs clarification is the role of github pages. I am
not sure why it is necessary at the moment and what exactly is the plan
going forward. If I understand well, currently it is used to preview the
changes but from my perspective we shouldn't need to commit something to
the repo to understand if something breaks or not; preview should happen
locally.

I would suggest to keep the changes around the revamp as minimal as
possible and not mix the content update with the framework change. As
usual, smaller changes are easier to review and merge. It is definitely
worth updating and improving the content but let's do it incrementally so
that changes can get merged faster.

The list of committers and PMC members for Hive can be found in the apache
phonebook [1]. The list can easily get outdated so maybe we can consider
adding links to [1] and/or github and other places instead of duplicating
the content. Anyways, let's first deal with the revamp and discuss content
changes later in separate JIRAs/PRs.

Best,
Stamatis

[1] https://home.apache.org/phonebook.html?project=hive

On Sun, Oct 2, 2022 at 2:41 AM Simhadri G  wrote:

> Hello Everyone,
>
> I have raised the PR for the revamped Hive Website here:
>  https://github.com/apache/hive-site/pull/2
>
> I kindly request if someone can help review this PR .
>
> Until the PR is merged, you can find the updated website here . Please
> have a look and any feedback is most welcome :)
> https://simhadri-g.github.io/hive-site/
>
> Few other things to note:
>
>- We will need help from someone who has write access to hive-site
>repo to update the github workflow once PR is merged.
>- One more important question, I came across this (
>https://hive.apache.org/people.html ) page, while moving the .md file
>to the new website, which lists the current pmc and committers of hive. I
>noticed that this list is not upto date, a lot of people seem to be missing
>from this list. May I please know where I can find the updated date list of
>committers and PMCs which I can refer to and update the page.
>- Lastly, I plan to add a few more sections to the homepage soon, one
>of the sections I have in mind is to add an overview of all the apache
>projects that use or integrate with apache hive... If there are any other
>suggestions in addition to this please let me know.
>
>
> Thanks!
> Simhadri G
>
>
>
> On Sat, Sep 24, 2022 at 7:03 AM Simhadri G  wrote:
>
>> Thanks everyone,
>>
>>  I will begin with creating the PR and share the link in this thread soon.
>>
>> Thanks
>> Simhadri G
>>
>> On Sat, 24 Sep 2022, 04:52 Ayush Saxena,  wrote:
>>
>>> Thanx Everyone,
>>> Almost a week and we don’t seems to have any objections to start with up
>>> revamp task with hive-site repo for now.
>>>
>>> Other things as mentioned can be followed up and we can try to ask folks
>>> to establish a PMC consensus if the need be for the futher migration tasks.
>>>
>>> Simhadri, would be good to create a Jira and link the PR and drop the
>>> link here in the thread as well, so as people interested can drop
>>> suggestions regarding the design and content of the website over there, for
>>> anything else we can always come back here if we are blocked on something,
>>> or if something more needs to be done in this context.
>>>
>>> -Ayush
>>>
>>> On 21-Sep-2022, at 6:35 PM, Stamatis Zampetakis 
>>> wrote:
>>>
>>>
>>> 
>>> The javadocs are currently in svn and they can remain there for the
>>> moment. Eventually, they could be moved to a hive-site repository and for
>>> sure we don't want them in the main hive repo. I don't see an immediate
>>> need to change the place where javadocs are stored but if needed we can
>>> raise a JIRA ticket and continue the discussion there. It's not a good idea
>>> to discuss under a closed issue/PR.
>>>
>>> The hive-site repo is always gonna be the place for storing the
>>> generated website (html files etc). When you talk about moving back to the
>>> hive repo I 

Consider using bi-directional links in JIRA

2022-10-14 Thread Stamatis Zampetakis
Hi all,

This is a small tip/reminder for everyone using JIRA.

It is very common and convenient to refer to other tickets by adding the
HIVE-X pattern in summary, description, and comments.

The pattern allows someone to navigate quickly to an older JIRA from the
current one but not the other way around.

Ideally, along with the mention (HIVE-X) pattern, it helps to add an
explicit link (relates to, causes, depends upon, etc.) so that the
relationship between tickets is visible from both ends.

This is extremely useful when we are reporting a regression/breaking change
from a past commit but in other cases as well.

Best,
Stamatis


Re: [Branching] Apache Hive 4.0.0-alpha-2 Release

2022-10-20 Thread Stamatis Zampetakis
Hi everyone,

The past discussions around the version used in JIRA can be found in the
following threads [1, 2].

As Ayush mentioned there are ~3K resolved tickets with Fix Version 4.0.0
but most of them are also tagged with 4.0.0-alpha1 [3].

I can easily remove the 4.0.0 tag from those tickets and keep only
4.0.0-alpha1 in Fix Version using bulk update; we can treat the
remaining cases afterwards.
Let me know if you want me to proceed with this change or if you have
other suggestions.

Best,
Stamatis

[1] https://lists.apache.org/thread/13w1s029b6gbych56zhzvj4x2vbv8k8q
[2] https://the-asf.slack.com/archives/CFSSP9UPJ/p1646216936802329
[3] project = hive and status = resolved and fixVersion = 4.0.0 and
fixVersion = 4.0.0-alpha-1

On Thu, Oct 20, 2022 at 6:53 AM Ayush Saxena  wrote:

> Hi Denys,
> Flaging the version stuff here what we discussed offline.
>
> The version 4.0.0 was used prior the decision to rename it 4.0.0-alpha1
> was taken, but the rename was not done on the Jira.
>
> It still shows around 3K tickets resolved on version 4.0.0, which got
> released already as part of the last 4.0.0-alpha1 release[1]
>
> We should rename it in the Jira to avoid issues with building Release
> Notes for our next 4.0.0 release
>
> A simple INFRA ticket or anyone with Hive Jira Admin rights should be able
> to do so.
>
> [1]
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0
>
> -Ayush
>
>
> > On 19-Oct-2022, at 9:09 PM, Denys Kuzmenko 
> wrote:
> >
> > Hi Team,
> >
> > Branching for Hive *4.0.0-alpha-2* was done today:
> > https://github.com/apache/hive/tree/branch-4.0.0-alpha-2
> >
> > The next development version is *4.0.0-SNAPSHOT*
> >
> > *alpha-2 *branch would be open for commits until Monday. If you would
> like
> > to include something major in it, but need more time, please let me know.
> >
> > Best regards,
> > Denys
>


Re: [Branching] Apache Hive 4.0.0-alpha-2 Release

2022-10-20 Thread Stamatis Zampetakis
No script at all, I am just going to use the "Bulk Update" feature in JIRA.

I will wait 24h in case there are objections and then move forward with the
update.

Best,
Stamatis

On Thu, Oct 20, 2022 at 12:12 PM Denys Kuzmenko
 wrote:

> Hi Stamatis,
>
> If you have an automatic script for that please run it.
>
> Note, I saw tickets with `fixVersion = 4.0.0 and fixVersion =
> 4.0.0-alpha-2` in resolved state as well.
>
> https://issues.apache.org/jira/browse/HIVE-26643?jql=project%20%3D%20HIVE%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20in%20(4.0.0)%20AND%20fixVersion%20in%20(4.0.0-alpha-2)
>
> Regards,
> Denys
>
> On Thu, Oct 20, 2022 at 11:26 AM Stamatis Zampetakis 
> wrote:
>
> > Hi everyone,
> >
> > The past discussions around the version used in JIRA can be found in the
> > following threads [1, 2].
> >
> > As Ayush mentioned there are ~3K resolved tickets with Fix Version 4.0.0
> > but most of them are also tagged with 4.0.0-alpha1 [3].
> >
> > I can easily remove the 4.0.0 tag from those tickets and keep only
> > 4.0.0-alpha1 in Fix Version using bulk update; we can treat the
> > remaining cases afterwards.
> > Let me know if you want me to proceed with this change or if you have
> > other suggestions.
> >
> > Best,
> > Stamatis
> >
> > [1] https://lists.apache.org/thread/13w1s029b6gbych56zhzvj4x2vbv8k8q
> > [2] https://the-asf.slack.com/archives/CFSSP9UPJ/p1646216936802329
> > [3] project = hive and status = resolved and fixVersion = 4.0.0 and
> > fixVersion = 4.0.0-alpha-1
> >
> > On Thu, Oct 20, 2022 at 6:53 AM Ayush Saxena  wrote:
> >
> > > Hi Denys,
> > > Flaging the version stuff here what we discussed offline.
> > >
> > > The version 4.0.0 was used prior the decision to rename it 4.0.0-alpha1
> > > was taken, but the rename was not done on the Jira.
> > >
> > > It still shows around 3K tickets resolved on version 4.0.0, which got
> > > released already as part of the last 4.0.0-alpha1 release[1]
> > >
> > > We should rename it in the Jira to avoid issues with building Release
> > > Notes for our next 4.0.0 release
> > >
> > > A simple INFRA ticket or anyone with Hive Jira Admin rights should be
> > able
> > > to do so.
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0
> > >
> > > -Ayush
> > >
> > >
> > > > On 19-Oct-2022, at 9:09 PM, Denys Kuzmenko  > .invalid>
> > > wrote:
> > > >
> > > > Hi Team,
> > > >
> > > > Branching for Hive *4.0.0-alpha-2* was done today:
> > > > https://github.com/apache/hive/tree/branch-4.0.0-alpha-2
> > > >
> > > > The next development version is *4.0.0-SNAPSHOT*
> > > >
> > > > *alpha-2 *branch would be open for commits until Monday. If you would
> > > like
> > > > to include something major in it, but need more time, please let me
> > know.
> > > >
> > > > Best regards,
> > > > Denys
> > >
> >
>


Re: [Branching] Apache Hive 4.0.0-alpha-2 Release

2022-10-21 Thread Stamatis Zampetakis
Using the bulk update feature of JIRA I did the following operations based
on some JQL filters.

Remove the 4.0.0 tag from the following:
* project = hive and (status = Resolved or status = closed) and (fixVersion
= 4.0.0-alpha-1 or fixVersion = 4.0.0-alpha-2)

Clear fixVersion from the following leaving also an appropriate comment:
* project = hive and status in (Open, "In Progress","In Review" , "Patch
Available") and fixVersion  is not EMPTY
Please check the JIRA guidelines in [1] to find out more about the
reasoning behind this change.
It appears that there is a JIRA bug [2] when handling archived versions so
you may notice that a few tickets still have the version set.

Change fixVersion from 4.0.0 to 4.0.0-alpha-2 for remaining resolved
tickets.
* project = hive and status = resolved and fixVersion = 4.0.0

After all these actions there are no tickets with fixVersion = 4.0.0
* project = hive and fixVersion = 4.0.0

If someone plans/wants to take additional actions please leave a message in
this thread.

Best,
Stamatis

[1] https://cwiki.apache.org/confluence/display/Hive/HowToContribute
[2] https://jira.atlassian.com/browse/JRASERVER-6419

On Fri, Oct 21, 2022 at 8:49 AM Ayush Saxena  wrote:

> Correction: Marking as released for alpha2 and deleting and recreating 4.0
>
>
> On 21-Oct-2022, at 9:51 AM, Ayush Saxena  wrote:
>
> 
> Thanx Stamatis for volunteering. I think we should delete the
> 4.0.0-alpha-2 version from the Jira as well to prevent people from
> carelessly using that.
>
> There is an option here[1] to delete a version, can give a try like if
> just deleting the 4.0.0 also from this list helps or not.
>
> [1]
> https://issues.apache.org/jira/plugins/servlet/project-config/HIVE/administer-versions?status=unreleased
>
> -Ayush
>
> On Thu, 20 Oct 2022 at 15:51, Stamatis Zampetakis 
> wrote:
>
>> No script at all, I am just going to use the "Bulk Update" feature in
>> JIRA.
>>
>> I will wait 24h in case there are objections and then move forward with
>> the
>> update.
>>
>> Best,
>> Stamatis
>>
>> On Thu, Oct 20, 2022 at 12:12 PM Denys Kuzmenko
>>  wrote:
>>
>> > Hi Stamatis,
>> >
>> > If you have an automatic script for that please run it.
>> >
>> > Note, I saw tickets with `fixVersion = 4.0.0 and fixVersion =
>> > 4.0.0-alpha-2` in resolved state as well.
>> >
>> >
>> https://issues.apache.org/jira/browse/HIVE-26643?jql=project%20%3D%20HIVE%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20in%20(4.0.0)%20AND%20fixVersion%20in%20(4.0.0-alpha-2)
>> >
>> > Regards,
>> > Denys
>> >
>> > On Thu, Oct 20, 2022 at 11:26 AM Stamatis Zampetakis > >
>> > wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > The past discussions around the version used in JIRA can be found in
>> the
>> > > following threads [1, 2].
>> > >
>> > > As Ayush mentioned there are ~3K resolved tickets with Fix Version
>> 4.0.0
>> > > but most of them are also tagged with 4.0.0-alpha1 [3].
>> > >
>> > > I can easily remove the 4.0.0 tag from those tickets and keep only
>> > > 4.0.0-alpha1 in Fix Version using bulk update; we can treat the
>> > > remaining cases afterwards.
>> > > Let me know if you want me to proceed with this change or if you have
>> > > other suggestions.
>> > >
>> > > Best,
>> > > Stamatis
>> > >
>> > > [1] https://lists.apache.org/thread/13w1s029b6gbych56zhzvj4x2vbv8k8q
>> > > [2] https://the-asf.slack.com/archives/CFSSP9UPJ/p1646216936802329
>> > > [3] project = hive and status = resolved and fixVersion = 4.0.0 and
>> > > fixVersion = 4.0.0-alpha-1
>> > >
>> > > On Thu, Oct 20, 2022 at 6:53 AM Ayush Saxena 
>> wrote:
>> > >
>> > > > Hi Denys,
>> > > > Flaging the version stuff here what we discussed offline.
>> > > >
>> > > > The version 4.0.0 was used prior the decision to rename it
>> 4.0.0-alpha1
>> > > > was taken, but the rename was not done on the Jira.
>> > > >
>> > > > It still shows around 3K tickets resolved on version 4.0.0, which
>> got
>> > > > released already as part of the last 4.0.0-alpha1 release[1]
>> > > >
>> > > > We should rename it in the Jira to avoid issues with building
>> Release
>> > > > Notes for our next 4.0.0 release
>> > > >
>> > > > A si

Re: [Draft] Board report for October for Apache Hive

2022-10-21 Thread Stamatis Zampetakis
Thanks for putting this together Naveen!

The reports are usually for the whole quarter [1] so when describing
membership data, project activity, etc, we shouldn't restrict the info to
the last month.

Note that membership and project activity sections are somewhat incomplete.
Please check the instructions in [1] focusing on the following two
questions:

* When did the project last make any releases? [REQUIRED]
* When were the newest committers or PMC members elected? [REQUIRED]

>From my perspective it is a bit worrisome that there have been no new
committers/PMC members in the last 8 months.
This is a comment/question that may also come from the board so the PMC
should probably look into this.

Best,
Stamatis

[1] https://www.apache.org/foundation/board/reporting

On Thu, Oct 20, 2022 at 6:55 AM Naveen Gangam 
wrote:

> Thanks Ayush for the review and the pointers. I wasn't aware of these
> statistics (still finding my way around).
>
> According to the Health report, Hive has a health score of 6.33. Compared
> to last quarter activity is down but it is activity in the preceding 4
> weeks is higher compared to 4 weeks prior to that.
> Community Health Score (Chi): 6.33 (Healthy)
> 
>
> Here is a revised report. (sorry about the font color)
> ## Description:
>
> The Apache Hive ™ data warehouse software facilitates reading, writing, and
> managing large datasets residing in distributed storage (Apache Hadoop)
> using SQL.
>
> ## Issues:
>
> No issues requiring board attention this time.
>
> ## Membership Data:
>
> Apache Hive was founded 2010-09-21 (~12 years ago)
>
> There are currently 104 committers and 52 PMC members in this project.
>
> The Committer-to-PMC ratio is roughly 2:1.
>
> Community changes, past month:
>
> No changes
>
> ## Project activity
>
> Release criteria for 4.0.0GA Planning underway. The master branch is now
> versioned 4.0.0 from alpha2.
>
> Jira activity:
>
> In the trailing 31 days, 109 jiras
>  have been opened,
> 30  of which have
> been FIXED. A total of 68 jiras
>  have been
> closed/resolved and a total of 59 jiras
>  have been FIXED.
>
> ## Community Health:
>
> Community Health Score (Chi): 6.33 (Healthy)
> 
>
> Community activity is relatively healthy based on engagement. But compared
> to last quarter, overall activity (jira/github/dev lists) is down this
> quarter but the activity in the preceding 4 weeks is higher compared to 4
> weeks prior to that.
>
>
>
>
> On Wed, Oct 19, 2022 at 11:11 PM Ayush Saxena  wrote:
>
> > +1, Thanx Naveen for driving this. Looks good!!!
> >
> > I guess the community health is there only as a heading but nothing below
> > that. Better to write a line, 'that everything is good' like previous
> > reports [1], or maybe you can drive some pointers from reporter.a.o [2]
> for
> > it.
> >
> > [1] https://whimsy.apache.org/board/minutes/Hive.html
> > [2] https://reporter.apache.org/wizard/statistics?hive
> >
> > -Ayush
> >
> > On Thu, 20 Oct 2022 at 08:28, Naveen Gangam  >
> > wrote:
> >
> > > Please review and provide any feedback.October 2022
> > >
> > > ## Description:
> > >
> > > The Apache Hive ™ data warehouse software facilitates reading, writing,
> > and
> > > managing large datasets residing in distributed storage (Apache Hadoop)
> > > using SQL.
> > >
> > > ## Issues:
> > >
> > > No issues requiring board attention this time.
> > >
> > > ## Membership Data:
> > >
> > > Apache Hive was founded 2010-09-21 (~12 years ago)
> > >
> > > There are currently 104 committers and 52 PMC members in this project.
> > >
> > > The Committer-to-PMC ratio is roughly 3:1.
> > >
> > > Community changes, past month:
> > >
> > > No changes
> > >
> > > ## Project activity
> > >
> > > Release criteria for 4.0.0GA Planning underway. The master branch is
> now
> > > versioned 4.0.0 from alpha2.
> > >
> > > Jira activity:
> > >
> > > In the trailing 31 days, 109 jiras
> > >  have been
> > opened,
> > > 30  of which
> > have
> > > been FIXED. A total of 68 jiras
> > >  have been
> > > closed/resolved and a total of 59 jiras
> > >  have been
> > FIXED.
> > >
> > > ## Community Health:
> > >
> >
>


Re: Consider using bi-directional links in JIRA

2022-10-21 Thread Stamatis Zampetakis
I added a few sentences about this in JIRA Guidelines [1].

Best,
Stamatis

[1] https://cwiki.apache.org/confluence/display/Hive/HowToContribute

On Thu, Oct 20, 2022 at 4:56 AM Naveen Gangam  wrote:

> +1. I find this very useful to know the dependencies/relationships. Thank
> you for bringing this up.
>
> On Fri, Oct 14, 2022 at 5:06 AM Stamatis Zampetakis 
> wrote:
>
>> Hi all,
>>
>> This is a small tip/reminder for everyone using JIRA.
>>
>> It is very common and convenient to refer to other tickets by adding the
>> HIVE-X pattern in summary, description, and comments.
>>
>> The pattern allows someone to navigate quickly to an older JIRA from the
>> current one but not the other way around.
>>
>> Ideally, along with the mention (HIVE-X) pattern, it helps to add an
>> explicit link (relates to, causes, depends upon, etc.) so that the
>> relationship between tickets is visible from both ends.
>>
>> This is extremely useful when we are reporting a regression/breaking
>> change from a past commit but in other cases as well.
>>
>> Best,
>> Stamatis
>>
>


Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-28 Thread Stamatis Zampetakis
-1 (non-binding)

Ubuntu 20.04.5 LTS, java version "1.8.0_261", Apache Maven 3.6.3

* Verified signatures and checksums OK
* Checked diff between git repo and release sources (diff -qr hive-git
hive-src) KO (among other *.iml files present in release sources but not in
git)
* Checked LICENSE, NOTICE, and README.md file OK
* Built from release sources (mvn clean install -DskipTests -Pitests) OK
* Package binaries from release sources (mvn clean package -DskipTests) OK
* Built from git tag (mvn clean install -DskipTests -Pitests) OK
* Run smoke tests on pseudo cluster using hive-dev-box [1] OK
* Spot check maven artifacts for general structure, LICENSE, NOTICE,
META-INF content KO (NOTICE file in hive-exec-4.0.0-alpha-2.jar has
copyright for 2020)

Smoke tests included: * Derby metastore initialization * simple CREATE
TABLE statements; * basic INSERT INTO VALUES statements; * basic SELECT
statements with simple INNER JOIN, WHERE, and GROUP BY variations; *
EXPLAIN statement variations; * ANALYZE TABLE variations;

The negative vote is for the spurious *.iml (IntelliJ project) files
present in the release sources and the outdated NOTICE file in maven
artifacts).

Also javadoc artifacts are missing from maven staging repo. I checked
previous releases and it seems that they were not there as well so this is
not blocking but may be worth fixing for the next release.

Best,
Stamatis

[1] https://lists.apache.org/thread/7yqs7o6ncpottqx8txt0dtt9858ypsbb
https://repository.apache.org/content/repositories/orgapachehive-1117/org/apache/hive/hive-exec/4.0.0-alpha-2/hive-exec-4.0.0-alpha-2.jar

On Fri, Oct 28, 2022 at 10:32 AM Ayush Saxena  wrote:

> +1 (non-binding)
> * Built from source.
> * Verified Checksums.
> * Verified Signatures
> * Ran some basic unit tests.
> * Ran some basic ACID & Iceberg related queries with Tez.
> * Skimmed through the Maven Artifacts, Looks Good.
>
> Thanx Denys for driving the release. Good Luck!!!
>
> -Ayush
>
> On Fri, 28 Oct 2022 at 13:46, Denys Kuzmenko  .invalid>
> wrote:
>
> > Extending voting for 24hr. 1 more +1 is needed from the PMC to promote
> the
> > release.
> > If not given, I'll be closing this vote as unsuccessful.
> >
> > On Thu, Oct 27, 2022 at 11:16 PM Chris Nauroth 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > * Verified all checksums.
> > > * Verified all signatures.
> > > * Built from source.
> > > * mvn clean install -Piceberg -DskipTests
> > > * Tests passed.
> > > * mvn --fail-never clean verify -Piceberg -Pitests
> > > -Dmaven.test.jvm.args='-Xmx2048m -DJETTY_AVAILABLE_PROCESSORS=4'
> > >
> > > I figured out why my test runs were failing in HTTP server
> > initialization.
> > > Jetty enforces thread leasing to warn or abort if there aren't enough
> > > threads available [1]. During startup, it attempts to lease a thread
> per
> > > NIO selector [2]. By default, the number of NIO selectors to use is
> > > determined based on available CPUs [3]. This is mostly a passthrough to
> > > Runtime.availableProcessors() [4]. In my case, running on a machine
> with
> > 16
> > > CPUs, this ended up creating more than 4 selectors, therefore requiring
> > > more than 4 threads and violating the lease check. I was able to work
> > > around this by passing the JETTY_AVAILABLE_PROCESSORS system property
> to
> > > constrain the number of CPUs available to Jetty.
> > >
> > > If we are intentionally constraining the pool to 4 threads during
> itests,
> > > then would it also make sense to limit JETTY_AVAILABLE_PROCESSORS in
> > > maven.test.jvm.args of the root pom.xml, so that others don't run into
> > this
> > > problem later? If so, I'll send a pull request.
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ThreadPoolBudget.java#L165
> > > [2]
> > >
> > >
> >
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L255
> > > [3]
> > >
> > >
> >
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-io/src/main/java/org/eclipse/jetty/io/SelectorManager.java#L79
> > > [4]
> > >
> > >
> >
> https://github.com/eclipse/jetty.project/blob/jetty-9.4.40.v20210413/jetty-util/src/main/java/org/eclipse/jetty/util/ProcessorUtils.java#L45
> > >
> > > Chris Nauroth
> > >
> > >
> > > On Thu, Oct 27, 2022 at 1:18 PM Alessandro Solimando <
> > > alessandro.solima...@gmail.com> wrote:
> > >
> > > > You are right Ayush, I got sidetracked by the release notes (*
> > > [HIVE-19217]
> > > > - Upgrade to Hadoop 3.1.0) and I did not check the versions in the
> pom
> > > > file, apologies for the false alarm but better safe than sorry.
> > > >
> > > > With the right versions in place (Hadoop 3.3.1 and Tez 10.0.2), tests
> > > > including select, join, groupby, orderby, explain (ast, cbo, cbo
> cost,
> > > > vectorization) are working correctly, against data in ORC and parquet
> > > >

Re: Updating Wiki about Hikari Configuration Properties

2022-10-28 Thread Stamatis Zampetakis
Hive PMC members can provide edit rights to the wiki.

@Naveen, Dennys, Adam: Can someone please give write privileges to Chris?

Best,
Stamatis

On Fri, Oct 28, 2022 at 8:41 AM Chris Nauroth  wrote:

> Hi everyone,
>
> Regarding this page:
>
>
> https://cwiki.apache.org/confluence/display/hive/configuration+properties#ConfigurationProperties-HiveMetastoreConnectionPoolingConfiguration
>
> It states that the metastore's Hikari connection pool can be configured by
> specifying properties prefixed as "hikari". This is not quite correct. In
> HIVE-17317, there was a bug fix made to the Hikari integration such that
> the proper prefix is "hikaricp". For example:
>
>   
> hikaricp.minimumIdle
> 4
> false
> Dataproc Cluster Properties
>   
>
> Could you please grant access to me (cnaur...@apache.org) to update the
> page? If you prefer not to grant access, could a Hive committer make the
> change for me?
>
> BTW, the reason I discovered this is that I recently upgraded a cluster
> from Hive 2.x (default BoneCP) to Hive 3.x (default HikariCP). After the
> upgrade, I found that HiveMetaStore was generating far more database
> connections at baseline, putting extra burden on the database. It appears
> that BoneCP default behavior (4 idle connections) is different from
> HikariCP default behavior (idle connections equal to max connections which
> is 10). This put me down the path of wanting to control Hikari's
> minimumIdle setting and then finding this discrepancy in the documentation.
>
> Passing on this information in case others are seeing unusually high
> connection counts after an upgrade to 3.x.
>
> Chris Nauroth
>


Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-28 Thread Stamatis Zampetakis
I think that having a proper NOTICE file in jars is important to comply
with the ASF release policy:
* https://www.apache.org/legal/release-policy.html#licensing-documentation
* https://www.apache.org/legal/src-headers.html#notice
* https://www.apache.org/legal/src-headers.html#faq-binaries
The fact that the NOTICE wasn't updated in alpha-1 is most likely an
oversight.

Having said that the final decision is up to the release manager.

Best,
Stamatis

On Fri, Oct 28, 2022 at 1:57 PM Denys Kuzmenko
 wrote:

> Hi Stamatis,
>
> My bad, sorry. Removed the ".imp" files and updated the release artifacts.*
> *** NO CODE CHANGES 
> I was following the alpha-1 release and the NOTICE wasn't updated there as
> well. I don't think that should be a blocker. Noted that + javadoc
> artifacts for the new RC.
>
> fc7908f40ec854671c6795acb525649d83c071d70cf62961dc90a251a0f45e47
>  apache-hive-4.0.0-alpha-2-bin.tar.gz
> f2814aadeca56ad1d8d9f7797b99d1670f6450f68ff6cae829384c9c102cd7a9
>  apache-hive-4.0.0-alpha-2-src.tar.gz
>
> Thanks,
> Denys
>
> On Fri, Oct 28, 2022 at 12:28 PM Stamatis Zampetakis 
> wrote:
>
> > -1 (non-binding)
> >
> > Ubuntu 20.04.5 LTS, java version "1.8.0_261", Apache Maven 3.6.3
> >
> > * Verified signatures and checksums OK
> > * Checked diff between git repo and release sources (diff -qr hive-git
> > hive-src) KO (among other *.iml files present in release sources but not
> in
> > git)
> > * Checked LICENSE, NOTICE, and README.md file OK
> > * Built from release sources (mvn clean install -DskipTests -Pitests) OK
> > * Package binaries from release sources (mvn clean package -DskipTests)
> OK
> > * Built from git tag (mvn clean install -DskipTests -Pitests) OK
> > * Run smoke tests on pseudo cluster using hive-dev-box [1] OK
> > * Spot check maven artifacts for general structure, LICENSE, NOTICE,
> > META-INF content KO (NOTICE file in hive-exec-4.0.0-alpha-2.jar has
> > copyright for 2020)
> >
> > Smoke tests included: * Derby metastore initialization * simple CREATE
> > TABLE statements; * basic INSERT INTO VALUES statements; * basic SELECT
> > statements with simple INNER JOIN, WHERE, and GROUP BY variations; *
> > EXPLAIN statement variations; * ANALYZE TABLE variations;
> >
> > The negative vote is for the spurious *.iml (IntelliJ project) files
> > present in the release sources and the outdated NOTICE file in maven
> > artifacts).
> >
> > Also javadoc artifacts are missing from maven staging repo. I checked
> > previous releases and it seems that they were not there as well so this
> is
> > not blocking but may be worth fixing for the next release.
> >
> > Best,
> > Stamatis
> >
> > [1] https://lists.apache.org/thread/7yqs7o6ncpottqx8txt0dtt9858ypsbb
> >
> >
> https://repository.apache.org/content/repositories/orgapachehive-1117/org/apache/hive/hive-exec/4.0.0-alpha-2/hive-exec-4.0.0-alpha-2.jar
> >
> > On Fri, Oct 28, 2022 at 10:32 AM Ayush Saxena 
> wrote:
> >
> > > +1 (non-binding)
> > > * Built from source.
> > > * Verified Checksums.
> > > * Verified Signatures
> > > * Ran some basic unit tests.
> > > * Ran some basic ACID & Iceberg related queries with Tez.
> > > * Skimmed through the Maven Artifacts, Looks Good.
> > >
> > > Thanx Denys for driving the release. Good Luck!!!
> > >
> > > -Ayush
> > >
> > > On Fri, 28 Oct 2022 at 13:46, Denys Kuzmenko  > > .invalid>
> > > wrote:
> > >
> > > > Extending voting for 24hr. 1 more +1 is needed from the PMC to
> promote
> > > the
> > > > release.
> > > > If not given, I'll be closing this vote as unsuccessful.
> > > >
> > > > On Thu, Oct 27, 2022 at 11:16 PM Chris Nauroth 
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > * Verified all checksums.
> > > > > * Verified all signatures.
> > > > > * Built from source.
> > > > > * mvn clean install -Piceberg -DskipTests
> > > > > * Tests passed.
> > > > > * mvn --fail-never clean verify -Piceberg -Pitests
> > > > > -Dmaven.test.jvm.args='-Xmx2048m -DJETTY_AVAILABLE_PROCESSORS=4'
> > > > >
> > > > > I figured out why my test runs were failing in HTTP server
> > > > initialization.
> > > > > Jetty enforces thread leasing to warn or abort if there aren't
> enough
> > > > >

Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 0

2022-10-31 Thread Stamatis Zampetakis
Thanks for pushing this forward Denys.

A few general comments regarding the procedure.

Every time the artifacts, sources, hashes, or something significant changes
previous casted votes are cancelled. It is usually easier to track this
down by cancelling the RC and starting another.

For traceability reasons, it is also helpful to send an explicit
[CANCEL][VOTE] email [1, 2, 3, 4] when the vote is unsuccessful (preferably
as a new thread).

It is not strictly necessary to cancel the vote after 72h.
The ASF release policy [5] requires the vote to be open for at least 72h
but does not specify when exactly it should be closed after that.
It's usually up to the release manager to decide and extend the duration if
necessary.

Best,
Stamatis

[1] https://lists.apache.org/thread/0501xbk1hvb46gy0w8ts6g5ttw7crssl
[2] https://lists.apache.org/thread/qzt7mgxjloh4841pvcdoz707bfxd4wk2
[3] https://lists.apache.org/thread/64foh1w7xwv9vs8m8grb23sc9f8h2bct
[4] https://lists.apache.org/thread/t09zfwfbjzon9hdv11smyyfydgx0m8zg
[5] https://www.apache.org/legal/release-policy.html#release-approval

On Sat, Oct 29, 2022, 8:32 PM Denys Kuzmenko 
wrote:

> Hi team,
>
> Thank you for taking time to verify this RC!
> Unfortunately, we didn't get enough votes to go ahead with the release.
>
> Closing this vote as unsuccessful.
>
> Kind regards,
> Denys
>
> On Fri, Oct 28, 2022, 15:56 Stamatis Zampetakis  wrote:
>
> > I think that having a proper NOTICE file in jars is important to comply
> > with the ASF release policy:
> > *
> https://www.apache.org/legal/release-policy.html#licensing-documentation
> > * https://www.apache.org/legal/src-headers.html#notice
> > * https://www.apache.org/legal/src-headers.html#faq-binaries
> > The fact that the NOTICE wasn't updated in alpha-1 is most likely an
> > oversight.
> >
> > Having said that the final decision is up to the release manager.
> >
> > Best,
> > Stamatis
> >
> > On Fri, Oct 28, 2022 at 1:57 PM Denys Kuzmenko
> >  wrote:
> >
> > > Hi Stamatis,
> > >
> > > My bad, sorry. Removed the ".imp" files and updated the release
> > artifacts.*
> > > *** NO CODE CHANGES 
> > > I was following the alpha-1 release and the NOTICE wasn't updated there
> > as
> > > well. I don't think that should be a blocker. Noted that + javadoc
> > > artifacts for the new RC.
> > >
> > > fc7908f40ec854671c6795acb525649d83c071d70cf62961dc90a251a0f45e47
> > >  apache-hive-4.0.0-alpha-2-bin.tar.gz
> > > f2814aadeca56ad1d8d9f7797b99d1670f6450f68ff6cae829384c9c102cd7a9
> > >  apache-hive-4.0.0-alpha-2-src.tar.gz
> > >
> > > Thanks,
> > > Denys
> > >
> > > On Fri, Oct 28, 2022 at 12:28 PM Stamatis Zampetakis <
> zabe...@gmail.com>
> > > wrote:
> > >
> > > > -1 (non-binding)
> > > >
> > > > Ubuntu 20.04.5 LTS, java version "1.8.0_261", Apache Maven 3.6.3
> > > >
> > > > * Verified signatures and checksums OK
> > > > * Checked diff between git repo and release sources (diff -qr
> hive-git
> > > > hive-src) KO (among other *.iml files present in release sources but
> > not
> > > in
> > > > git)
> > > > * Checked LICENSE, NOTICE, and README.md file OK
> > > > * Built from release sources (mvn clean install -DskipTests -Pitests)
> > OK
> > > > * Package binaries from release sources (mvn clean package
> -DskipTests)
> > > OK
> > > > * Built from git tag (mvn clean install -DskipTests -Pitests) OK
> > > > * Run smoke tests on pseudo cluster using hive-dev-box [1] OK
> > > > * Spot check maven artifacts for general structure, LICENSE, NOTICE,
> > > > META-INF content KO (NOTICE file in hive-exec-4.0.0-alpha-2.jar has
> > > > copyright for 2020)
> > > >
> > > > Smoke tests included: * Derby metastore initialization * simple
> CREATE
> > > > TABLE statements; * basic INSERT INTO VALUES statements; * basic
> SELECT
> > > > statements with simple INNER JOIN, WHERE, and GROUP BY variations; *
> > > > EXPLAIN statement variations; * ANALYZE TABLE variations;
> > > >
> > > > The negative vote is for the spurious *.iml (IntelliJ project) files
> > > > present in the release sources and the outdated NOTICE file in maven
> > > > artifacts).
> > > >
> > > > Also javadoc artifacts are missing from maven staging repo. I checked
> > > > previous releases and it seems that they were not there a

Re: [EXTERNAL] Re: Proposal : New Release 3.2.0 | Fixing CVE's and Bugs on apache hive branch-3

2022-11-04 Thread Stamatis Zampetakis
Hey everyone,

It would be nice to have a new release from branch 3 although it might not
be that trivial to get out.

It will definitely require a bit of investment from multiple people
including the PMC and the committers of the project. Note that the last
vote for alpha2 was unsuccessful due to lack of votes, which shows that
people are pretty busy.

Personally, I support this effort and would like to see it happen but this
period I don't have sufficient time to invest to help with reviews and
commits for 3.X line.

Best,
Stamatis

On Fri, Nov 4, 2022, 5:28 AM Aman Raj  wrote:

> Hi Chris,
>
> I plan on going through this diff and making a comprehensive list of all
> the major bug fixes that went into branch-3 and not in hive-313. This will
> be included in the umbrella JIRA that I am creating.
>
> In this email thread I have only mentioned CVEs and upgrades that will go
> on top of these changes in branch-3.
> Thanks,
> Aman.
>
> 
> From: Chris Nauroth 
> Sent: Friday, November 4, 2022 3:44 AM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Proposal : New Release 3.2.0 | Fixing CVE's
> and Bugs on apache hive branch-3
>
> I noticed that there is a pretty large delta (256 commits) between release
> 3.1.3 and the current branch-3:
>
> > git log --oneline rel/release-3.1.3..upstream-branch-3 | wc
> 2564208   33558
>
> I just wanted to mention that a release from branch-3 would include far
> more than what we are cataloging on this mail thread.
>
> Chris Nauroth
>
>
> On Thu, Nov 3, 2022 at 12:16 PM Pravin Sinha 
> wrote:
>
> > +1,
> >
> > Thanks for driving this, Aman. Apart from CVE fixes, do you have a list
> of
> > JIRAs to be targeted?
> >
> > -Pravin
> >
> > On Thu, Nov 3, 2022 at 11:12 PM Chris Nauroth 
> wrote:
> >
> > > Thank you for driving this!
> > >
> > > To kick things off, I have filed HIVE-26702 for a backport of
> HIVE-17315
> > (a
> > > total of 5 sub-tasks/patches) to 3.2.0. This adds support for more
> > flexible
> > > configuration of the metastore's database connection pooling.
> Dataproc's
> > > distribution has been running this in production backported onto
> release
> > > 3.1.3, so I can provide the patches.
> > >
> > > May I assume that our intent is to keep 3.2.x backward-compatible with
> > > 3.1.x?
> > >
> > > Chris Nauroth
> > >
> > >
> > > On Thu, Nov 3, 2022 at 3:53 AM Sankar Hariappan
> > >  wrote:
> > >
> > > > +1, I'm excited to see the scope includes important upgrades and CVE
> > > fixes.
> > > > We should carefully port the relevant patches from master as code has
> > > been
> > > > heavily refactored. But, it make perfect sense to give another 3.x
> > > release
> > > > from Hive to keep the users delighted.
> > > > Thanks Aman for the initiative!
> > > >
> > > > Thanks,
> > > > Sankar
> > > >
> > > > -Original Message-
> > > > From: 张铎(Duo Zhang) 
> > > > Sent: Thursday, November 3, 2022 2:53 PM
> > > > To: dev@hive.apache.org
> > > > Subject: [EXTERNAL] Re: Proposal : New Release 3.2.0 | Fixing CVE's
> and
> > > > Bugs on apache hive branch-3
> > > >
> > > > [You don't often get email from palomino...@gmail.com. Learn why
> this
> > is
> > > > important at https://aka.ms/LearnAboutSenderIdentification ]
> > > >
> > > > +1, and please include HIVE-24694...
> > > >
> > > > Thanks.
> > > >
> > > > Aman Raj  于2022年11月3日周四 17:03写道:
> > > > >
> > > > > Hi team,
> > > > >
> > > > >
> > > > > We know that Hive 4.0.0 release is ongoing but considering the
> number
> > > of
> > > > changes going into the release, it will take some iterations to come
> up
> > > > with the stable version for the same. Meanwhile there are a lot of
> > issues
> > > > in Hive 3.1.3 which our customers have reported. In this scenario, it
> > > makes
> > > > sense to make a release from branch-3 which will have all the
> necessary
> > > > upgrades, bug and CVE fixes which are causing issues to the existing
> > > > customers. Also, Hive is still using Hadoop 3.1.0 whereas Spark 3.3
> has
> > > > already moved to Hadoop 3.3.1. Therefore, we need to do the same for
> > > hive.
> > > > >
> > > > >
> > > > >
> > > > > I will be happy to take the ownership of this new release and will
> be
> > > > creating JIRA's for all the fixes that will go on with this release.
> > > > >
> > > > >
> > > > >
> > > > > Therefore, I am proposing a new release cut out from branch-3. The
> > > > release version would be hive-3.2.0.
> > > > >
> > > > >
> > > > >
> > > > > This version will include major upgrades as:
> > > > >
> > > > >   1.  Hadoop version upgrade to 3.3.4
> > > > >   2.  Zookeeper version upgrade to 3.6.3
> > > > >   3.  Tez version upgrade to 0.10.2
> > > > >   4.  Calcite version upgrade to 1.25.0
> > > > >   5.  Orc version upgrade to 1.6.9
> > > > >
> > > > > This version will also include major CVE fixes as follows:
> > > > >
> > > > >   1.  NVD - CVE-2020-13949 (nist.gov)<
> > > >
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=

Re: [VOTE] Apache Hive 4.0.0-alpha-2 Release Candidate 1

2022-11-11 Thread Stamatis Zampetakis
+1 (non-binding)

Ubuntu 20.04.5 LTS, java version "1.8.0_261", Apache Maven 3.6.3

* Verified signatures and checksums OK
* Checked diff between git repo and release sources (diff -qr hive-git
hive-src) OK
* Checked LICENSE, NOTICE, and README.md file OK
* Built from release sources (mvn clean install -DskipTests -Pitests) OK
* Package binaries from release sources (mvn clean package -DskipTests) OK
* Built from git tag (mvn clean install -DskipTests -Pitests) OK
* Run smoke tests on pseudo cluster using hive-dev-box [1] OK
* Spot check maven artifacts for general structure, LICENSE, NOTICE,
META-INF content OK

Smoke tests included: * Derby metastore initialization * simple CREATE
TABLE statements (TPCH Orders, Lineitem tables); * basic LOAD FROM LOCAL
statements; * basic SELECT statements with simple INNER JOIN, WHERE, and
GROUP BY variations; * EXPLAIN statement variations; * ANALYZE TABLE
variations;

While checking some of the maven artifacts I noticed
that hive-exec-4.0.0-alpha-2.jar had two NOTICE files under META-INF
(NOTICE.txt for Apache Commons Lang). Not blocking but maybe we should
check/fix this in the next release.

Best,
Stamatis

[1] https://lists.apache.org/thread/7yqs7o6ncpottqx8txt0dtt9858ypsbb

On Fri, Nov 11, 2022 at 5:51 PM Naveen Gangam 
wrote:

> Hi Denys,
> Thank you for publishing the release bits.
>
> *SIGNATURE VERIFICATION*
>
> gpg --verify apache-hive-4.0.0-alpha-2-bin.tar.gz.asc
> apache-hive-4.0.0-alpha-2-bin.tar.gz
>
> gpg: Signature made Mon Nov  7 13:04:05 2022 EST
>
> gpg:using RSA key 50606DE1BDBD5CF862A595A907C5682DAFC73125
>
> gpg:issuer "dkuzme...@apache.org"
>
> gpg: Good signature from "Denys Kuzmenko (CODE SIGNING KEY) <
> dkuzme...@apache.org>" [unknown]
>
> gpg: WARNING: The key's User ID is not certified with a trusted signature!
>
> gpg:  There is no indication that the signature belongs to the
> owner.
>
> Primary key fingerprint: 5060 6DE1 BDBD 5CF8 62A5  95A9 07C5 682D AFC7 3125
>
>
> $ gpg --verify apache-hive-4.0.0-alpha-2-src.tar.gz.asc
> apache-hive-4.0.0-alpha-2-src.tar.gz
>
> gpg: Signature made Mon Nov  7 13:04:25 2022 EST
>
> gpg:using RSA key 50606DE1BDBD5CF862A595A907C5682DAFC73125
>
> gpg:issuer "dkuzme...@apache.org"
>
> gpg: Good signature from "Denys Kuzmenko (CODE SIGNING KEY) <
> dkuzme...@apache.org>" [unknown]
>
> gpg: WARNING: The key's User ID is not certified with a trusted signature!
>
> gpg:  There is no indication that the signature belongs to the
> owner.
>
> Primary key fingerprint: 5060 6DE1 BDBD 5CF8 62A5  95A9 07C5 682D AFC7 3125
>
>
> shasum -a 256 -c apache-hive-4.0.0-alpha-2-src.tar.gz.sha256
>
> apache-hive-4.0.0-alpha-2-src.tar.gz: OK
>
>
> $ shasum -a 256 -c apache-hive-4.0.0-alpha-2-bin.tar.gz.sha256
>
> apache-hive-4.0.0-alpha-2-bin.tar.gz: OK
>
>
>
> *BUILD VERIFICATION:*
>
>
>- From the source attachment, I was able to build using "mvn clean
>install -DskipTests -Pitests"
>- I also build from the git tag created for the release.
>
>
> *CHECKIN TESTS --> I think these tests are flaky*
>
> [*INFO*] Running org.apache.hadoop.hive.metastore.client.*TestCatalogs*
>
> [*WARNING*] *Tests **run: 18*, Failures: 0, Errors: 0, *Skipped: 2*, Time
> elapsed: 4.342 s - in
> org.apache.hadoop.hive.metastore.client.*TestCatalogs*
>
> [*INFO*] Running org.apache.hadoop.hive.metastore.*TestMarkPartition*
>
> [*INFO*] *Tests run: 1*, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> 14.895 s - in org.apache.hadoop.hive.metastore.*TestMarkPartition*
>
> [*INFO*]
>
> [*INFO*] Results:
>
> [*INFO*]
>
> [*ERROR*] *Errors: *
>
> [*ERROR*] *  TestMysql.install » Runtime Unable to start docker container*
>
> [*ERROR*] *  TestMysql.upgrade » Runtime Unable to start docker container*
>
> [*ERROR*] *  TestOracle.install » Runtime Unable to start docker container*
>
> [*ERROR*] *  TestOracle.upgrade » Runtime Failed to get docker logs*
>
> [*INFO*]
>
> [*ERROR*] *Tests run: 2259, Failures: 0, Errors: 4, Skipped: 5*
>
> [*INFO*]
>
> [*INFO*]
> **
>
> [*INFO*] *BUILD FAILURE*
>
> [*INFO*]
> **
>
> [*INFO*] Total time:  25:20 min
>
> [*INFO*] Finished at: 2022-11-10T15:40:36-05:00
>
> [*INFO*]
> **
>
>
> *RUNTIME*
>
> Started services from the binaries published (using local hadoop 3.1.0)
>
>
>- Installed schema for derby
>- started HS2 + HMS
>- Ran queries from beeline (DDL and DML)
>- Explain queries
>- CTAS queries.
>
>
> +1 for me.
>
>
> On Mon, Nov 7, 2022 at 2:00 PM Denys Kuzmenko
>  wrote:
>
> > UPD: Voting will conclude in 1 week (Monday 14th).
> >
> >
> > On Mon, Nov 7, 2022 at 7:57 PM Denys Kuzmenko 
> > wrote:
> >
> > > Hi team,
> > >
> > > Let's give it 1 more chance.
> > >
> > > Apache Hive 4.0.0-alpha-2 Release 

[DISCUSS] Jira Public Signup Disabled

2022-11-15 Thread Stamatis Zampetakis
Hi everyone,

Due to the large amount of spam account creation the ASF INFRA team has
disabled the JIRA account creation [1].

>From the 11th of November, contributors who wish to have a JIRA account (to
create, assign, watch, etc issues) will need to request an account through
an ASF PMC.

Other projects, such as Calcite, have already taken the necessary actions
to streamline the process for new contributors [2].

I would suggest drawing inspiration from Calcite and take similar actions
in Hive.

If you all agree we can start by creating a dedicated (private) mailing
lists for such requests:
jira-reque...@hive.apache.org

and then proceed with a brief documentation of the process in the wiki or
website.

What do you think?

Best,
Stamatis

[1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
[2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s


Re: [DISCUSS] Jira Public Signup Disabled

2022-11-15 Thread Stamatis Zampetakis
Logged https://issues.apache.org/jira/browse/INFRA-23905 for the creation
of the new mailing list.

On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri 
wrote:

> +1, Thank you, Stamatis.
>
> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
> wrote:
>
>> +1, Thanks, Stamatis.
>>
>> -Pravin
>>
>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Due to the large amount of spam account creation the ASF INFRA team has
>>> disabled the JIRA account creation [1].
>>>
>>> From the 11th of November, contributors who wish to have a JIRA account
>>> (to create, assign, watch, etc issues) will need to request an account
>>> through an ASF PMC.
>>>
>>> Other projects, such as Calcite, have already taken the necessary
>>> actions to streamline the process for new contributors [2].
>>>
>>> I would suggest drawing inspiration from Calcite and take similar
>>> actions in Hive.
>>>
>>> If you all agree we can start by creating a dedicated (private) mailing
>>> lists for such requests:
>>> jira-reque...@hive.apache.org
>>>
>>> and then proceed with a brief documentation of the process in the wiki
>>> or website.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Stamatis
>>>
>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>
>>


Re: [EXTERNAL] Re: Proposal : New Release 3.2.0 | Fixing CVE's and Bugs on apache hive branch-3

2022-11-17 Thread Stamatis Zampetakis
Hey Aman,

Thanks for pushing this forward.

I show you updated the Fix Version field in a couple of JIRAs; please
revert these changes.

According to the Hive JIRA guidelines [1], the Fix Version should be set
when the fix is committed; you may want to use the Target Version field
instead. If you want to track progress then you may need to create new
tickets for the backports cause existing tickets should not be reopened.

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRAGuidelines

On Thu, Nov 17, 2022 at 8:35 AM Aman Raj 
wrote:

> Hi everyone,
>
> I have started categorizing tasks into 4 categories (created 4 subtasks on
> the umbrella JIRA for the same):
>
>   1.  CVE fixes
>   2.  Component Upgrades
>   3.  Bug fixes and Improvements on top of branch-3 commits.
>   4.  Differences between branch-3 and Hive-3.1.3 commits (The is no task
> involved in this as of now. Just a way to track what all commits went in
> after 3.1.3 in branch-3)
>
> I have created a new label (release-3.2.0) which can be used to create
> subtasks involving the backports of the JIRAs mentioned in these 4 Subtasks.
>
> I would welcome the community to add more JIRAs which match these
> categories and update the JIRA page as well.
>
> The parent JIRA fyi : https://issues.apache.org/jira/browse/HIVE-26748
>
> Thanks,
> Aman.
> 
> From: Aman Raj 
> Sent: Thursday, November 17, 2022 12:09 PM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Proposal : New Release 3.2.0 | Fixing CVE's
> and Bugs on apache hive branch-3
>
> [You don't often get email from raja...@microsoft.com.invalid. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi everyone,
>
> I thank everyone who upvoted for this release and I am sure we will make
> it a success. As a start point, I have created an umbrella JIRA
> [HIVE-26748] Prepare for Hive 3.2.0 Release - ASF JIRA (apache.org)<
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26748&data=05%7C01%7Crajaman%40microsoft.com%7C0852325809f74d4baaab08dac8667b44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638042639758407506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BtJG7SnHiYpe1TuDP0tW8h9tn5t%2FMoNzmA7oTaXpR34%3D&reserved=0>
> where I will start adding the JIRAs that will be cherry picked as part of
> the 3.2.0 release. I have also included the suggestions given by the
> community till now in the email threads below.
>
> Please feel free to suggest any new and important bug fixes or features
> that can be included as part of this release.
>
> Thanks,
> Aman.
> 
> From: Naveen Gangam 
> Sent: Tuesday, November 8, 2022 7:49 PM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Proposal : New Release 3.2.0 | Fixing CVE's
> and Bugs on apache hive branch-3
>
> [You don't often get email from ngan...@cloudera.com.invalid. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Thank you Aman for volunteering to drive this. +1 for a release off
> branch-3. We can fix all the CVEs we have fixed on master.
>
> IMHO, the hadoop upgrade might be too big a task for this release. Last I
> checked, there were some pending items from this upgrade even on master.
> They may not be hard dependencies but if we are committing to this, might
> take a bit longer to finish the release.
>
> I started to build this Jira Board for the releases. The goal was to use
> this to track release items (for all releases) via the use of jira
> labels/target versions.
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FRapidBoard.jspa%3FrapidView%3D564&data=05%7C01%7Crajaman%40microsoft.com%7C0852325809f74d4baaab08dac8667b44%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638042639758407506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BLGYT%2B4KuHeWjP433lEOKRG87nP%2B%2F5ottrK55DroGwc%3D&reserved=0
> At the top of this board, there are some quick filters for release blockers
> (jiras with labels "hive-4.0.0-must"). There are currently only 2 jiras
> tagged as blockers for 4.0.
>
> If you could tag the jiras for 3.2 release the same way, and add a quick
> filter, that would be great.
>
> Thank you again
> Naveen
>
> On Fri, Nov 4, 2022 at 7:01 AM Stamatis Zampetakis 
> wrote:
>
> > Hey everyone,
> >
> > It would be nice to have a new release from branch 3 although it might
> not
>

Re: Result of the TPC-DS benchmark using Iceberg,

2022-11-17 Thread Stamatis Zampetakis
Hi Sungwoo,

Many thanks for sharing your findings; interesting observations.

If you can please also share the project versions that you used for running
the experiments.

Best,
Stamatis

On Tue, Nov 15, 2022 at 12:46 PM Sungwoo Park  wrote:

> Hello,
>
> I ran the TPC-DS benchmark using Metastore (in the traditional way) and
> Iceberg,
> and would like to share the result for those interested in Hive using
> Iceberg.
> The experiment used 1TB TPC-DS dataset stored as ORC.
>
> Here are a few findings.
>
> 1. Overall, Hive-Iceberg runs slightly faster than Hive-Metastore.
>
> 2. Some queries run much faster with Hive-Iceberg. Examples)
> query 14-1) Hive-Metastore: 61 seconds, Hive-Iceberg: 28 seconds
> query 78) Hive-Metastore: 141 seconds, Hive-Iceberg: 58 seconds
>
> 3. Some queries run much slower with Hive-Iceberg. Example)
> query 22: Hive-Metastore: 32 seconds, Hive-Iceberg: 356 seconds
> (The slow execution is due to InputInitializer generating only 4 tasks for
> the
> first Map vertex.)
>
> 4. Out of 99 queries, 98 queries return correct results, but query 64
> returns
> wrong results (returning 0 rows) due to an exception:
>
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
>
> hdfs://blue0:8020/tmp/hive/user/35d3bdd7-4fda-4f3d-818d-048ad6242072/hive_2022-11-14_15-26-21_045_8992557056967167667-1/-mr-10001/.hive-staging_hive_2022-11-14_15-26-21_045_8992557056967167667-1/-ext-10002
>
> --- Sungwoo
>
>
>
>


Re: [ANNOUNCE] Apache Hive 4.0.0-alpha-2 Released

2022-11-17 Thread Stamatis Zampetakis
Many thanks to everyone who made this release happen and especially Denys
for leading this effort!

Best,
Stamatis

On Wed, Nov 16, 2022 at 5:25 PM Denys Kuzmenko  wrote:

> The Apache Hive team is proud to announce the release of Apache Hive
> version 4.0.0-alpha-2
>
> The Apache Hive (TM) data warehouse software facilitates querying and
> managing large datasets residing in distributed storage. Built on top
> of Apache Hadoop (TM), it provides, among others:
>
> * Tools to enable easy data extract/transform/load (ETL)
>
> * A mechanism to impose structure on a variety of data formats
>
> * Access to files stored either directly in Apache HDFS (TM) or in other
>   data storage systems such as Apache HBase (TM)
>
> * Query execution via Apache Hadoop MapReduce, Apache Tez and Apache
> Spark frameworks.
>
> For Hive release details and downloads, please
> visit:https://hive.apache.org/downloads.html
>
> Hive 4.0.0-alpha-2 Release Notes are available
> here:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12351489&styleName=Html&projectId=12310843
>
> We would like to thank the many contributors who made this release
> possible.
>
> Regards,
>
> The Apache Hive Team
>


Re: [DISCUSS] Jira Public Signup Disabled

2022-11-17 Thread Stamatis Zampetakis
The jira-reque...@hive.apache.org has been created and I added relevant
instructions on how to request a JIRA account in the wiki [1]. Feel free to
improve as you see fit!

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA

On Tue, Nov 15, 2022 at 9:59 PM Stamatis Zampetakis 
wrote:

> Logged https://issues.apache.org/jira/browse/INFRA-23905 for the creation
> of the new mailing list.
>
> On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri 
> wrote:
>
>> +1, Thank you, Stamatis.
>>
>> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
>> wrote:
>>
>>> +1, Thanks, Stamatis.
>>>
>>> -Pravin
>>>
>>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis 
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Due to the large amount of spam account creation the ASF INFRA team has
>>>> disabled the JIRA account creation [1].
>>>>
>>>> From the 11th of November, contributors who wish to have a JIRA account
>>>> (to create, assign, watch, etc issues) will need to request an account
>>>> through an ASF PMC.
>>>>
>>>> Other projects, such as Calcite, have already taken the necessary
>>>> actions to streamline the process for new contributors [2].
>>>>
>>>> I would suggest drawing inspiration from Calcite and take similar
>>>> actions in Hive.
>>>>
>>>> If you all agree we can start by creating a dedicated (private) mailing
>>>> lists for such requests:
>>>> jira-reque...@hive.apache.org
>>>>
>>>> and then proceed with a brief documentation of the process in the wiki
>>>> or website.
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Stamatis
>>>>
>>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>>
>>>


[DISCUSS] Use "backward-incompatible" label in JIRA

2022-11-17 Thread Stamatis Zampetakis
Hi all,

In order to track changes who change the behavior of an existing component
or break public APIs we could adopt the use of JIRA labels as it is done in
other projects.

There are various existing labels used for this purpose in JIRA:
* Breaking-Change,
* breaking-api,
* breaking,
* breaking_change,
* backwards-incompatible,
* backward-incompatible

The last one, "backward-incompatible", seems to be the most widely used
across projects and in Hive itself so I would suggest sticking to that one
and using it consistently.

I just updated existing Hive tickets to use "backward-incompatible" and
removed the rest mentioned above. I also added a brief mention to the
existing guidelines [1].

If you prefer another one that's perfectly fine as well.

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-Guidelines


Re: Issue for discussion (slight change in behavior): HIVE-26683

2022-11-22 Thread Stamatis Zampetakis
Hello,

Regarding the inconsistency you describe in the window function, indeed it
seems to be a bug. However, I would double-check with the SQL standard
to be sure there is no intentional deviation and/or test the query in
different DBMS.

As far as it concerns the behavior of the aggregate function SUM on
string/varchar types the SQL standard forbids this operation (small extract
below).

10.9 

Syntax Rules
5g) If SUM or AVG is specified, then:
i) DT shall be a numeric type or an interval type.

General Rules
6)d)v) If SUM is specified, then the result is the sum of the values in
TXA. If the sum is not within the
range of the declared type of the result, then an exception condition is
raised: data exception — numeric value out of range.

As you observed, Postgres is inline with the standard and forbids this
operation but this is not the case for every DBMS. Note that Hive is closer
to MySQL than it is to Postgres so in many cases it makes sense to use it
as a reference.
Below, I outline the results on 8.0.27 MySQL Community Server.

select sum('a') from tblstrcol;
+--+
| sum('a') |
+--+
|0 |
+--+

select sum('a') from tblstrcol where false;
+--+
| sum('a') |
+--+
| NULL |
+--+

When there are rows the result of SUM is zero, and NULL when the result set
is empty thus I am a bit skeptical about changing the existing behavior.

Best,
Stamatis


On Mon, Nov 21, 2022 at 3:53 PM Stephen Carlin  wrote:

> Wanted to throw this one out for discussion for a bug I found  and how to
> fix it...
>
> So we are inconsistent with how we handle sum() on windowing functions.
> If all the rows are null and the rows are all on "preceding" rows, we
> return NULL.  On "following" rows, however, if all the rows are null, we
> return 0.  This is inconsistent and I have a fix for that so that we always
> return null.  The fix I have is here (not yet reviewed):
> https://github.com/apache/hive/pull/3789
>
> My discussion though lies in a different problem which you can see in the
> patch I uploaded.  My current fix changes behavior of the following
> statement:  "select sum('a') from my_table".  If my_table has rows, right
> now we are return 0.0.
>
> I've looked on postgres and it doesn't even allow a sum on a string column
> so I can't really compare to that database.  My current fix doesn't disable
> this, but it does change the behavior to return NULL on this select.
>
> I kinda feel that returning NULL is more correct than return 0, but I
> wanted to throw this out there to see what y'all think.  This would be a
> change in behavior and that makes me nervous.
>
> Thanks!
>


Re: Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline

2022-12-07 Thread Stamatis Zampetakis
Hi team,

I don't think you need any kind of special permissions to enable pre-commit
tests for branch-3. I have the impression that just committing an
appropriate Jenkinsfile (e.g., HIVE-24331 [1]) should do the trick.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-24331

On Wed, Dec 7, 2022 at 8:41 AM Sankar Hariappan
 wrote:

> Hi folks,
>
> It is a blocker for us to start the Hive 3.2 release efforts. Can someone
> help adding Jenkins pipeline for "branch-3" or pls add "sankarh", "mahesh"
> as admin?
>
> Thanks,
> Sankar
>
> -Original Message-
> From: Aman Raj 
> Sent: Monday, December 5, 2022 12:10 PM
> To: dev@hive.apache.org
> Subject: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline
>
> [You don't often get email from raja...@microsoft.com.invalid. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi team,
>
> Can someone please help me set up the Jenkins pipeline for branch-3 of
> Hive. I do not have access to do the same and it is essential to raise PR's
> for the 3.2.0 release. Please refer to the email thread for more details.
> Let me know if you need anything else from my side.
>
> Thanks,
> Aman.
> 
> From: Aman Raj 
> Sent: Friday, December 2, 2022 9:57 AM
> To: dev@hive.apache.org 
> Subject: [EXTERNAL] Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline
>
> Hi team,
>
> We have started working on the hive-3.2.0 release. The plan is to cut
> branch-3.2 from the base branch-3. While we were working on the 3.2.0
> release, we found that there are some commits (Please refer to this Parent
> JIRA for the analysis -
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26752&data=05%7C01%7CSankar.Hariappan%40microsoft.com%7Cde476c4006cf49eb3a9608dad68ba1a0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638058192487158937%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BU28qyK%2FM%2BnDk5wVGwGEr%2F7YMkx7mNw3h7iF0bdSRiU%3D&reserved=0)
> which went to branch-3.1 which doesn't exist in the branch-3, same we are
> planning to backport to branch-3. There can be two approaches
>
>
>
>   1.  Cherry-pick missed commits and push to branch-3
>   2.  Create Jira's for each commit and upload patch.
>
> Mostly we feel the second option will be better, any other thoughts will
> be appreciated.
>
>
> We also need help on creating the pre-commit pipelines for branch-3 as we
> don't have Jenkin's access.
>
>
>
> Please feel free to add any Jira's which you want to have to 3.2.0. All
> the changes for the 3.2.0 can be tracked here - Apache Hive - Agile Board -
> ASF JIRA<
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FRapidBoard.jspa%3FrapidView%3D564%26view%3Ddetail%26selectedIssue%3DHIVE-26749%26quickFilter%3D2586&data=05%7C01%7CSankar.Hariappan%40microsoft.com%7Cde476c4006cf49eb3a9608dad68ba1a0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638058192487158937%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eI19x31uSxYUbR6uXezz%2BJzg0G70hG%2BJBIbF4Qn6%2BgE%3D&reserved=0
> >
>
> A friendly request/reminder - After this exercise if any ongoing bug fixes
> on branch-3.x should also be cherry-picked to the base branch-3
>
> Thanks,
> Aman.
>
>


Re: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline

2022-12-07 Thread Stamatis Zampetakis
Hey Aman,

Before checking in the PR we should ensure that it works as expected; i.e.,
having a green run in a reasonable time.

Best,
Stamatis

On Wed, Dec 7, 2022 at 9:29 AM Aman Raj 
wrote:

> Hi Stamatis,
>
> I have raised a Pull Request for the same -
> https://github.com/apache/hive/pull/3841. Can you please check this and
> merge it.
>
> Thanks,
> Aman.
>
> 
> From: Aman Raj 
> Sent: Wednesday, December 7, 2022 1:50 PM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0
> pipeline
>
> Hi Stamatis,
>
> Sure, thanks a lot for your help. Will make that change and update this
> mail thread.
>
> Thanks,
> Aman.
> 
> From: Stamatis Zampetakis 
> Sent: Wednesday, December 7, 2022 1:42 PM
> To: dev@hive.apache.org 
> Subject: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline
>
> Hi team,
>
> I don't think you need any kind of special permissions to enable pre-commit
> tests for branch-3. I have the impression that just committing an
> appropriate Jenkinsfile (e.g., HIVE-24331 [1]) should do the trick.
>
> Best,
> Stamatis
>
> [1]
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-24331&data=05%7C01%7Crajaman%40microsoft.com%7C55b2f70094104475ffdf08dad82bebc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638059980431898757%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZPrs%2F3lmb0PfXSVY%2FV0Q6A0OKKkJWRDmHOMuQW%2FD7uU%3D&reserved=0
>
> On Wed, Dec 7, 2022 at 8:41 AM Sankar Hariappan
>  wrote:
>
> > Hi folks,
> >
> > It is a blocker for us to start the Hive 3.2 release efforts. Can someone
> > help adding Jenkins pipeline for "branch-3" or pls add "sankarh",
> "mahesh"
> > as admin?
> >
> > Thanks,
> > Sankar
> >
> > -Original Message-
> > From: Aman Raj 
> > Sent: Monday, December 5, 2022 12:10 PM
> > To: dev@hive.apache.org
> > Subject: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline
> >
> > [You don't often get email from raja...@microsoft.com.invalid. Learn why
> > this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Hi team,
> >
> > Can someone please help me set up the Jenkins pipeline for branch-3 of
> > Hive. I do not have access to do the same and it is essential to raise
> PR's
> > for the 3.2.0 release. Please refer to the email thread for more details.
> > Let me know if you need anything else from my side.
> >
> > Thanks,
> > Aman.
> > 
> > From: Aman Raj 
> > Sent: Friday, December 2, 2022 9:57 AM
> > To: dev@hive.apache.org 
> > Subject: [EXTERNAL] Sync of Branch-3 & Branch-3.1 for 3.2.0 pipeline
> >
> > Hi team,
> >
> > We have started working on the hive-3.2.0 release. The plan is to cut
> > branch-3.2 from the base branch-3. While we were working on the 3.2.0
> > release, we found that there are some commits (Please refer to this
> Parent
> > JIRA for the analysis -
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26752&data=05%7C01%7Crajaman%40microsoft.com%7C55b2f70094104475ffdf08dad82bebc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638059980431898757%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=mI55DvBaakdMcwj%2FCgdgf9t8TrLufokftmUUN%2FSc0VE%3D&reserved=0
> )
> > which went to branch-3.1 which doesn't exist in the branch-3, same we are
> > planning to backport to branch-3. There can be two approaches
> >
> >
> >
> >   1.  Cherry-pick missed commits and push to branch-3
> >   2.  Create Jira's for each commit and upload patch.
> >
> > Mostly we feel the second option will be better, any other thoughts will
> > be appreciated.
> >
> >
> > We also need help on creating the pre-commit pipelines for branch-3 as we
> > don't have Jenkin's access.
> >
> >
> >
> > Please feel free to add any Jira's which you want to have to 3.2.0. All
> > the changes for the 3.2.0 can be tracked here - Apache Hive - Agile
> Board -
> > ASF JIRA<
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fsecure%2FRapidBoard.jspa%3FrapidView%3D564%26view%3Ddetail%26selectedIssue%3DHIVE-26749%26quickFilter%3D2586&data=05%7C01%7Crajaman%40microsoft.com%7C55b2f70094104475ffdf08dad82bebc6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638059980431898757%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WZmuSpezan6vNwNksAdu8NX9lIMZA9QRju6EcnBsfk0%3D&reserved=0
> > >
> >
> > A friendly request/reminder - After this exercise if any ongoing bug
> fixes
> > on branch-3.x should also be cherry-picked to the base branch-3
> >
> > Thanks,
> > Aman.
> >
> >
>


Re: [EXTERNAL] Re: [ANNOUNCE] New PMC Member: Ayush Saxena

2022-12-19 Thread Stamatis Zampetakis
Congrats Ayush! Very well deserved!

Thanks for all the hard work that you are putting for the project and
always being there when people ask for help.

Best,
Stamatis

On Tue, Dec 20, 2022 at 7:51 AM Sankar Hariappan via user <
u...@hive.apache.org> wrote:

> Congrats Ayush!
>
>
>
> Thanks,
>
> Sankar
>
>
>
> *From:* Simhadri G 
> *Sent:* Tuesday, December 20, 2022 12:16 PM
> *To:* u...@hive.apache.org
> *Cc:* dev ; ayushsax...@apache.org
> *Subject:* [EXTERNAL] Re: [ANNOUNCE] New PMC Member: Ayush Saxena
>
>
>
> Congratulations Ayush
>
>
>
> On Tue, 20 Dec 2022, 06:42 Naveen Gangam,  wrote:
>
> Hello Hive Community,
>
> Apache Hive PMC is pleased to announce that Ayush Saxena has accepted the
> Apache Hive PMC's invitation to become PMC Member, and is now our newest
> PMC member. Many thanks to Ayush for all the contributions he has made and
> looking forward to many more future contributions in the expanded role.
>
>
>
> Please join me in congratulating Ayush !!!
>
>
>
> Cheers,
>
> Naveen (on behalf of Hive PMC)
>
>
>
>


Re: Lock branch-3 in order for PR build to run successfully.

2022-12-21 Thread Stamatis Zampetakis
Hello,

I don't believe a lock is necessary. I think that people with write access
to the repository already know the processes and how to behave.
If someone decides to push a commit to the repo without running pre-commit
tests there should be a good reason to do so.
I am hoping that circumventing the usual workflow should be a rather rare
event.

Best,
Stamatis

On Tue, Dec 20, 2022 at 8:50 AM Aman Raj 
wrote:

> Hi community,
>
> I see a couple of commits that went in directly to branch-3 before setting
> up the Jenkins pipeline for branch-3. To prevent this, can we lock the
> branch-3 of Hive in order to provide PR's the only way to merge commits in
> branch-3.
>
> Can someone help me in locking branch-3 so that we have a clean release
> process. I do not have the access to do it.
>
> Thanks,
> Aman.
> 
> From: Aman Raj 
> Sent: Friday, December 9, 2022 9:33 AM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0
> pipeline
>
> Thanks Pravin for your support. Can someone please help me merge this PR
> to branch-3 HIVE-26816 : Add Jenkins file for branch-3 by amanraj2520 ·
> Pull Request #3841 · apache/hive (github.com)<
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Fpull%2F3841&data=05%7C01%7Crajaman%40microsoft.com%7C94c1ac2c4ddd40437b5f08dad99a6017%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638061554335365489%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=D7CwaRcRaQ5ubjz3Ki95HkyclN2a%2BBZ7lvTddDQpTLY%3D&reserved=0>.
> I do not have access to do that. Then we will start development on it.
>
> Thanks,
> Aman.
>
>
> 
> From: Pravin Sinha 
> Sent: Friday, December 9, 2022 1:55 AM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0
> pipeline
>
> [You don't often get email from mailpravi...@gmail.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi Aman,
>  I also think that we can merge the PR to enable the test pipeline if the
> change looks fine and subsequently we can fix the tests to bring it to
> green state (hopefully by cherry picking a few commits from branch-3.1
> which is already in green state) . Looks like currently the tests are
> broken in branch-3.
>
> Thanks,
> Pravin
>
> On Thu, Dec 8, 2022 at 3:59 PM Aman Raj 
> wrote:
>
> > Hi team,
> >
> > For the addition of Jenkins file for branch-3, branch-3 has some existing
> > tests failing which was because Jenkins was not running on branch-3. We
> are
> > planning to merge this Jenkins file irrespective of this PR having test
> > failures, since this does not change the code. We will create separate
> > tasks for ensuring that branch-3 has a green build.
> >
> > Link to the PR :
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Fpull%2F3841&data=05%7C01%7Crajaman%40microsoft.com%7C94c1ac2c4ddd40437b5f08dad99a6017%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638061554335365489%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=D7CwaRcRaQ5ubjz3Ki95HkyclN2a%2BBZ7lvTddDQpTLY%3D&reserved=0
> >
> > Fyi, branch-3.1 has a green build.
> >
> > Thanks,
> > Aman.
> > 
> > From: Aman Raj 
> > Sent: Wednesday, December 7, 2022 3:19 PM
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0
> > pipeline
> >
> > Hi Ayush,
> >
> > Thanks for clarifying. Will wait for it to turn green.
> >
> > Thanks,
> > Aman.
> > 
> > From: Ayush Saxena 
> > Sent: Wednesday, December 7, 2022 3:11 PM
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Sync of Branch-3 & Branch-3.1 for 3.2.0
> > pipeline
> >
> > Hi Aman,
> > The build is already running for your PR:
> >
> >
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-3841%2F1%2Fpipeline&data=05%7C01%7Crajaman%40microsoft.com%7C94c1ac2c4ddd40437b5f08dad99a6017%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638061554335365489%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=osIglpTld3PtOPyFhGBLTSU9Ku1FWngPMofNQXILpyM%3D&reserved=0
> >
> > The JenkinsFile is picked

Re: Proposal: Revamp Apache Hive website.

2023-01-09 Thread Stamatis Zampetakis
Hi everyone,

Simhadri has been working hard to modernize the Hive website (HIVE-26565)
for the past few months and I am quite happy with the results.

I reviewed the respective PR [1] and will commit the changes in 24h unless
there are objections.

Best,
Stamatis

[1] https://github.com/apache/hive-site/pull/2

On Wed, Oct 5, 2022 at 8:46 PM Simhadri G  wrote:

> Thanks for the feedback Stamatis !
>
>- I have updated the PR to include a README.md file with instructions
>to build and view the site locally after making any new changes. This will
>help us preview the changes locally before pushing the commit. (Docker is
>not required here.)
>
>- Github pages was used to share the new website with the community
>and it will most likely not be necessary later on.
>
>- Regarding the role of Github Actions(gh-pages.yml):
>
>- Whenever a PR is merged to the main branch, a github action is
>   triggered .
>   - Github action will install a hugo and build the site with the new
>   changes.  Once the build is successful, HUGO then generates a set of 
> static
>   files and these files are automatically merged to the hive-site/asf-site
>   branch by github actions bot.
>   - From here, to publish  hive-site/asf-site to project web site
>   sub-domain (hive.apache.org),  we need to set up a configuration
>   block called publish in your .asf.yaml file. (
>   
> https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite).
>
>   - We will need help from apache infra - gmcdonald
>   <https://github.com/apache/hive-site/commits?author=gmcdonald> or
>   Humbedooh
>   <https://github.com/apache/hive-site/commits?author=Humbedooh> to
>   make sure that we have set this up correctly.
>
>   - I agree with your suggestion to keep the changes around the
>revamp as minimal as possible and not mix the content update with the
>framework change. In this case, we can make the other changes incrementally
>at a later stage.
>
>
> Thanks!
> Simhadri G
>
> On Wed, Oct 5, 2022 at 3:41 PM Stamatis Zampetakis 
> wrote:
>
>> Thanks for staying on top of this Simhadri.
>>
>> I will try to help reviewing the PR once I get some time.
>>
>> What is not yet clear to me from this discussion or by looking at the PR
>> is the workflow for making a change appear on the web (
>> https://hive.apache.org/). Having a README which clearly states what
>> needs to be done is a must.
>>
>> I also think it is quite important to have instructions and possibly
>> docker images for someone to be able to test how the changes look locally
>> before commiting a change to the repo.
>>
>> Another point that needs clarification is the role of github pages. I am
>> not sure why it is necessary at the moment and what exactly is the plan
>> going forward. If I understand well, currently it is used to preview the
>> changes but from my perspective we shouldn't need to commit something to
>> the repo to understand if something breaks or not; preview should happen
>> locally.
>>
>> I would suggest to keep the changes around the revamp as minimal as
>> possible and not mix the content update with the framework change. As
>> usual, smaller changes are easier to review and merge. It is definitely
>> worth updating and improving the content but let's do it incrementally so
>> that changes can get merged faster.
>>
>> The list of committers and PMC members for Hive can be found in the
>> apache phonebook [1]. The list can easily get outdated so maybe we can
>> consider adding links to [1] and/or github and other places instead of
>> duplicating the content. Anyways, let's first deal with the revamp and
>> discuss content changes later in separate JIRAs/PRs.
>>
>> Best,
>> Stamatis
>>
>> [1] https://home.apache.org/phonebook.html?project=hive
>>
>> On Sun, Oct 2, 2022 at 2:41 AM Simhadri G  wrote:
>>
>>> Hello Everyone,
>>>
>>> I have raised the PR for the revamped Hive Website here:
>>>  https://github.com/apache/hive-site/pull/2
>>>
>>> I kindly request if someone can help review this PR .
>>>
>>> Until the PR is merged, you can find the updated website here . Please
>>> have a look and any feedback is most welcome :)
>>> https://simhadri-g.github.io/hive-site/
>>>
>>> Few other things to note:
>>>
>>>- We will need help from someone who has write access to hive-site
>>>repo t

Re: [ANNOUNCE] New PMC Member: Stamatis Zampetakis

2023-01-16 Thread Stamatis Zampetakis
Thanks everyone! I am very glad and honoured to join the PMC.

I really enjoy being part of this community and It is great interacting
with all of you on a daily basis; thank you for being part of this!

Best,
Stamatis

On Mon, Jan 16, 2023 at 2:12 PM Jiajun Xie 
wrote:

> Congratulations Stamatis :)
> Very well deserved!!!
>
> On Mon, 16 Jan 2023 at 13:51, Krisztian Kasa 
> wrote:
>
> > Congratulations Stamatis :)
> >
> > On Mon, Jan 16, 2023 at 6:27 AM S T  wrote:
> >
> > > Congrats Stamatis.
> > >
> > > Thanks
> > >
> > > On Sat, 14 Jan 2023 at 00:03, Naveen Gangam 
> > wrote:
> > >
> > >> Hello Hive Community,
> > >> Apache Hive PMC is pleased to announce that Stamatis Zampetakis has
> > >> accepted the Apache Hive PMC's invitation to become PMC Member, and is
> > now
> > >> our newest PMC member. Please join me in congratulating Stamatis !!!
> > >>
> > >> He has been an active member in the hive community across many aspects
> > of
> > >> the project. Many thanks to Stamatis for all the contributions he has
> > made
> > >> and looking forward to many more future contributions in the expanded
> > role.
> > >>
> > >> Cheers,
> > >> Naveen (on behalf of Hive PMC)
> > >>
> > >
> >
>


Moderators for Hive mailing lists

2023-01-18 Thread Stamatis Zampetakis
Hi all,

It appears that most of the current moderators of the Hive mailing lists
are not very active in the project thus messages and subscriptions may take
a while to be approved.

I am planning to request myself to be added as moderator to all lists but
it would be could nice if two more people could join this effort. Due to
the nature of the mailing lists, some of which are private, these people
must be in the PMC.

Mailing lists of the project:
* comm...@hive.apache.org
* dev@hive.apache.org
* git...@hive.apache.org
* iss...@hive.apache.org
* priv...@hive.apache.org
* jira-reque...@hive.apache.org
* secur...@hive.apache.org
* u...@hive.apache.org

Best,
Stamatis


Re: [ANNOUNCE] New PMC Member: Laszlo Bodor

2023-01-30 Thread Stamatis Zampetakis
While the numbers give some insight, they do not tell the complete story of
how much Laszlo has helped drive the project forward.

Congrats Laszlo and thanks for everything that you have done for the
project.

Best,
Stamatis

On Sun, Jan 29, 2023 at 11:05 PM Sai Hemanth Gantasala <
saihema...@cloudera.com> wrote:

> Congratulations Laszlo!!
>
> On Sat, Jan 28, 2023 at 8:42 AM Simhadri G  wrote:
>
>> Congratulations Laszlo Bodor! :)
>>
>>
>>
>> On Sat, 28 Jan 2023, 20:26 Akshat m,  wrote:
>>
>>> Congratulations Laszlo
>>>
>>> Regards,
>>> Akshat
>>>
>>> On Sat, Jan 28, 2023 at 3:03 AM Naveen Gangam
>>> 
>>> wrote:
>>>
>>> > Hello Hive Community,
>>> > Apache Hive PMC is pleased to announce that Laszlo Bodor
>>> > (username:abstractdog) has accepted the Apache Hive PMC's invitation to
>>> > become PMC Member, and is now our newest PMC member. Please join me in
>>> > congratulating Laszlo !!!
>>> >
>>> > He has been an active member in the hive community across many aspects
>>> of
>>> > the project. Many thanks to Laszlo for all the contributions he has
>>> made
>>> > and looking forward to many more future contributions in the expanded
>>> role.
>>> >
>>> > https://github.com/apache/hive/commits?author=abstractdog
>>> >
>>> > * 96 commits in master [2]
>>> > * 66 reviews in master [3]
>>> > * Reported 163 JIRAS [6]
>>> >
>>> > Cheers,
>>> > Naveen (on behalf of Hive PMC)
>>> >
>>>
>>


Re: [ANNOUNCE] New PMC Member: Krisztian Kasa

2023-01-31 Thread Stamatis Zampetakis
Krisztian's impact on the project has been immense, particularly in areas
such as rewriting, view maintenance, iceberg integration, sub-query
processing, and top-k pushdown, to name a few.
His contributions cannot be fully captured by the numbers and go beyond
what they can indicate.

Keep up the amazing work, Krisztian!

Best,
Stamatis

On Tue, Jan 31, 2023 at 7:58 AM Akshat m  wrote:

> Congratulations Krisztian :)
>
> Regards,
> Akshat
>
> On Mon, Jan 30, 2023 at 10:23 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
>> Congratulations Krisztian, very well deserved! :)
>>
>> On Mon, 30 Jan 2023 at 17:34, László Bodor 
>> wrote:
>>
>>> Yay! Very well deserved. Krisztian has a broad knowledge of Hive and an
>>> extremely deep level of experience with the compiler itself (which is a
>>> huge beast we all know), looking forward to seeing further contributions!
>>>
>>> Naveen Gangam  ezt írta (időpont: 2023.
>>> jan. 30., H, 17:23):
>>>
 Hello Hive Community,
 Apache Hive PMC is pleased to announce that Krisztian Kasa (username:
 krisztiankasa) has accepted the Apache Hive PMC's invitation to become
 PMC
 Member, and is now our newest PMC member. Please join me in
 congratulating
 Krisztian !!!

 He has been an active member in the hive community across many aspects
 of
 the project. Many thanks to Krisztian for all the contributions he has
 made
 and looking forward to many more future contributions in the expanded
 role.

 https://github.com/apache/hive/commits?author=kasakrisz

 * 162 commits in master
 * 124 reviews in master
 * Reported 159 JIRAS

 Cheers,
 Naveen (on behalf of Hive PMC)

>>>


Branch-3 backports and build stability

2023-02-07 Thread Stamatis Zampetakis
Hi all,

The build in branch-3 is not yet green; there are ~25 test failures. It is
a common practice that we shouldn't push changes on top of a broken build
unless they are addressing test failures.

Some people (mainly Aman Raj, Chris Nauroth, and Laszlo Bodor) are working
hard to stabilize the build for quite some time now. If you want to help
out then start by reviewing, merging, and fixing things around test
failures.

It's not yet the time to bring new features, upgrades, bugs, etc., in
branch-3. I would encourage  committers to not approve such changes till we
get back to a stable branch.

Best,
Stamatis


Re: [ANNOUNCE] New committer for Apache Hive: Laszlo Vegh

2023-02-08 Thread Stamatis Zampetakis
Congratulations Laszlo!

ACID and compactions are a complex beast and the slightest problem there
can have a huge impact in the system.
Many thanks for all your work in this area that makes the life of the rest
of us much easier.

Best,
Stamatis

On Wed, Feb 8, 2023 at 9:46 AM Akshat m  wrote:

> Congratulations Laszlo, Very well deserved :)
>
> Regards,
> Akshat Mathur
>
> On Tue, Feb 7, 2023 at 9:08 PM Sai Hemanth Gantasala
>  wrote:
>
>> Congratulations Laszlo Vegh, Great work on the compaction stuff!!
>>
>> Thanks,
>> Sai.
>>
>> On Tue, Feb 7, 2023 at 4:24 AM Naveen Gangam 
>> wrote:
>>
>> > The Project Management Committee (PMC) for Apache Hive has invited
>> Laszlo
>> > Vegh (veghlaci05) to become a committer and we are pleased
>> > to announce that he has accepted.
>> >
>> > Contributions from Laszlo:
>> >
>> > He has authored 25 patches. Significant contributions to stabilization
>> of
>> > ACID compaction. Helped review other patches as well.
>> >
>> >
>> >
>> https://github.com/apache/hive/pulls?q=is%3Amerged+is%3Apr+author%3Aveghlaci05
>> >
>> > Being a committer enables easier contribution to the project since there
>> > is no need to go via the patch submission process. This should enable
>> > better productivity.A PMC member helps manage and guide the direction of
>> > the project.
>> >
>> > Congratulations
>> > Hive PMC
>> >
>>
>


Re: [ANNOUNCE] New committer for Apache Hive: Alessandro Solimando

2023-02-10 Thread Stamatis Zampetakis
In a rather short time Alessandro made many significant contributions for
the project.
He fixed many crucial issues with CBO adding also utils to fast diagnose
problems and quickly unblock production breakage.
He introduced Sonar in CI which is a big step for monitoring and improving
code quality, mentored newcomers, and opened the road for more performance
improvements by adding support for Histograms.
Finally, he has been a regular reviewer since day 1, something that is
vital for keeping the project healthy, and the community alive.

Congrats Alessandro, keep up the good work!

Best,
Stamatis

On Fri, Feb 10, 2023 at 1:13 PM Jiajun Xie 
wrote:

> Congratulations Alessandro!
>
> On Fri, 10 Feb 2023 at 17:36, Laszlo Vegh 
> wrote:
>
> > Congratulations Alessandro!
> >
> > Laszlo Vegh
> > lv...@cloudera.com
> >
> >
> >
> > > On 2023. Feb 9., at 7:40, Mahesh Raju Somalaraju <
> > maheshra...@cloudera.com> wrote:
> > >
> > > Congratulations Alessandro !!
> > >
> > > -Mahesh Raju S
> > >
> > > On Thu, Feb 9, 2023 at 1:31 AM Naveen Gangam  > > wrote:
> > > The Project Management Committee (PMC) for Apache Hive has invited
> > > Alessandro Solimando (asolimando) to become a committer and is pleased
> > > to announce that he has accepted.
> > >
> > > Contributions from Alessandro:
> > > He has authored 30 patches for Hive, 18 for Apache Calcite and has
> > > done many code reviews for other contributors. Vast experience and
> > > knowledge in SQL Compiler and Optimization. His most recent work was
> > > added support for histogram-based column stats in Hive.
> > >
> > > https://issues.apache.org/jira/issues/?filter=12352498 <
> > https://issues.apache.org/jira/issues/?filter=12352498>
> > >
> > > Being a committer enables easier contribution to the project since
> > > there is no need to go via the patch submission process. This should
> > > enable better productivity.A PMC member helps manage and guide the
> > > direction of the project.
> > >
> > > Congratulations
> > > Hive PMC
> >
> >
>


Re: [EXTERNAL] Re: Branch-3 backports and build stability

2023-02-17 Thread Stamatis Zampetakis
> +1,
> > > Thanks Stamatis and Lazlo for helping in the test case fixes till now.
> > >
> > > Team,
> > > I need help in fixing the following tests in Hive. I have tried
> different
> > > approaches but no luck till now.
> > > I am facing some issues in fixing the following tests :
> > > org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
> > >
> > > Issue :
> > > PREHOOK: Input: default@src
> > > PREHOOK: Output: default@src
> > > Failed to monitor Job[-1] with exception
> > > 'java.lang.IllegalStateException(Connection to remote Spark driver was
> > > lost)' Last known state = SENT
> > > Failed to execute spark task, with exception
> > > 'java.lang.IllegalStateException(RPC channel is closed.)'
> > > FAILED: Execution Error, return code 1 from
> > > org.apache.hadoop.hive.ql.exec.spark.SparkTask. RPC channel is closed.
> > >
> > > History :
> > > Initially the tests had failed with errors which I fixed in the
> following
> > > task :
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26940&data=05%7C01%7Crajaman%40microsoft.com%7C7cc87475f1fe4036bcd308db107faf36%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638121912852386975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qIgZVHldffGFLL7MERtkVwv8QFOPwrM49JD97BH%2Bku0%3D&reserved=0
> > >
> > > Does anyone know what the issue is here ? There are 6-7 failures
> because
> > > of this test case. Link to the failed test cases for the stacktrace :
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fci.hive.apache.org%2Fblue%2Forganizations%2Fjenkins%2Fhive-precommit%2Fdetail%2FPR-3949%2F2%2Ftests%2F&data=05%7C01%7Crajaman%40microsoft.com%7C7cc87475f1fe4036bcd308db107faf36%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638121912852386975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=B4nrnCh%2B2tC2OKYwzN81y8iHb30b2OaRMcZX3gQie2Y%3D&reserved=0
> > > Thanks,
> > > Aman.
> > >
> > > 
> > > From: László Bodor 
> > > Sent: Tuesday, February 7, 2023 4:46 PM
> > > To: dev@hive.apache.org 
> > > Subject: [EXTERNAL] Re: Branch-3 backports and build stability
> > >
> > > +1
> > > also, if I merged something that I thought was for test stability (but
> > > instead it was a feature), excuse me :)
> > > for reference, the whole green test initiative is tracked under this
> > > umbrella:
> > >
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FHIVE-26836&data=05%7C01%7Crajaman%40microsoft.com%7C7cc87475f1fe4036bcd308db107faf36%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638121912852386975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ainj7oCYknhYIHVmXITj4zBoo9466%2Bqof9ZIYkVnh44%3D&reserved=0
> > >
> > > Stamatis Zampetakis  ezt írta (időpont: 2023. febr.
> > 7.,
> > > K, 12:09):
> > >
> > > > Hi all,
> > > >
> > > > The build in branch-3 is not yet green; there are ~25 test failures.
> It
> > > is
> > > > a common practice that we shouldn't push changes on top of a broken
> > build
> > > > unless they are addressing test failures.
> > > >
> > > > Some people (mainly Aman Raj, Chris Nauroth, and Laszlo Bodor) are
> > > working
> > > > hard to stabilize the build for quite some time now. If you want to
> > help
> > > > out then start by reviewing, merging, and fixing things around test
> > > > failures.
> > > >
> > > > It's not yet the time to bring new features, upgrades, bugs, etc., in
> > > > branch-3. I would encourage  committers to not approve such changes
> > till
> > > we
> > > > get back to a stable branch.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > >
> >
>


Re: Proposal to deprecate Hive on Spark from branch-3

2023-02-22 Thread Stamatis Zampetakis
I am +1 on removing/deprecating Hive on Spark since we've done this on
master and apparently nobody is interested in this feature.

Best,
Stamatis

On Wed, Feb 22, 2023 at 8:18 AM Aman Raj 
wrote:

> Hi team,
>
> We have been trying to fix Hive on Spark test failures for a long time. As
> of now, branch-3 has less than 12 test failures (whose fix have not been
> identified). 8 of them are related to Hive on Spark. I had mailed about the
> failures in my previous mail threads. Thanks to Vihang for working on them
> as well. But we have not been able to identify the root cause till now.
> These fixes can be tracked in the following tickets : [HIVE-27087] Fix
> TestMiniSparkOnYarnCliDriver test failures on branch-3 - ASF JIRA (
> apache.org) and
> [HIVE-26940] Backport of HIVE-19882 : Fix QTestUtil session lifecycle - ASF
> JIRA (apache.org)
>
> Until we have a green branch-3, we cannot go ahead to push new features
> for the Hive-3.2.0 release. This is kind of a blocker for this release.
> Already bringing the test fixes to the current state took more than 2
> months.
>
> I wanted to bring up a proposal to deprecate Hive on Spark from branch-3
> altogether. This would ensure that branch-3 is aligned with the master as
> done in https://issues.apache.org/jira/browse/HIVE-26134. Just wanted to
> have a vote on this in parallel working on the test fixes. If we have the
> approval from the community, we can deprecate it altogether.
>
> Please feel free to suggest any concerns or suggestions you have. Also, I
> welcome any possible fix suggestion for the test failures.
>
> Thanks,
> Aman.
>


Re: [EXTERNAL] Re: Proposal to deprecate Hive on Spark from branch-3

2023-02-27 Thread Stamatis Zampetakis
Some people raised a valid point that branch-3 is a maintenance branch. If
we really aim 3.2.0 to be a maintenance release then we should minimize
breaking changes and prohibit new features. In this case Spark cannot go
away and the only thing we can do is deprecate it. It also means that we
should fix the tests cause failures typically indicate breaking changes
which again are not tolerable for a maintenance release.

On the other hand,
I got the impression that some people were interested for getting new
features in 3.2.0 (  some may be in already). Furthermore, some dependency
upgrades may also lead to breaking changes/different behavior so we should
definitely agree on what is acceptable and what is not for branch-3.

Summing up the question boils down to the following. Do we allow breaking
changes and new features in branch-3 or not?

Best,
Stamatis

On Fri, Feb 24, 2023, 10:41 AM Aman Raj 
wrote:

> Hi Laszlo,
>
> I am perfectly fine with disabling the Hive on Spark tests. In fact, I
> prefer that. I agree with Vihang and you on this. I had proposed this idea
> long back (of disabling the test cases) and then we had discussed on the
> community that either we fix the Hive on Spark test cases or remove Hive on
> Spark. Therefore, I initiated this thread of removing Hive on Spark since
> we are not still able to resolve the test cases since the past couple of
> months.
>
> Thanks,
> Aman.
>
> 
> From: László Bodor 
> Sent: Friday, February 24, 2023 2:57 PM
> To: dev@hive.apache.org 
> Subject: Re: [EXTERNAL] Re: Proposal to deprecate Hive on Spark from
> branch-3
>
> +1 on Vihang's suggestion
> I remember that spark removal was a debated thing even on master, so
> completely removing it backwards from a "maintenance" branch-3 line is not
> really acceptable (actually, I'm surprised it's not -1ed yet by hive on
> spark folks), but it depends on what *deprecation* really means: I mean
> disabling some spark tests to stabilize precommit is completely fine in the
> absence of community aspiration to fix them properly
>
> regarding the motivation: "This would ensure that branch-3 is aligned with
> the master as done in ..."  <-- I don't think we're targeting this, we are
> about to make 3.x releases as simply as possible
>
> I'm hoping/assuming that most of the +1s so far are in line with Vihang's
> suggestion
>
> vihang karajgaonkar  ezt írta (időpont: 2023. febr.
> 23., Cs, 16:37):
>
> > +1 to deprecate Hive on Spark.
> >
> > I feel directly removing it in a minor release is probably a bad idea.
> Most
> > users will upgrade to 3.2 first and go to 4.0 later. If we deprecate it
> in
> > 3.2 it transitions well into its removal as users upgrade to 4.0
> > eventually.
> >
> > If the goal to stabilize the branch-3, we can disable the failing tests
> on
> > Hive on Spark.
> >
> > Thanks,
> > Vihang
> >
> > On Thu, Feb 23, 2023 at 12:32 AM Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > +1 from me too
> > >
> > > On Thu, 23 Feb 2023 at 06:09, Ayush Saxena  wrote:
> > >
> > > > +1 on removing Hive on Spark from branch-3
> > > >
> > > > -Ayush
> > > >
> > > > > On 23-Feb-2023, at 6:40 AM, Wang, Yuming  >
> > > > wrote:
> > > > >
> > > > > +1.
> > > > >
> > > > > From: Naresh P R 
> > > > > Date: Thursday, February 23, 2023 at 02:49
> > > > > To: dev@hive.apache.org 
> > > > > Subject: Re: [EXTERNAL] Re: Proposal to deprecate Hive on Spark
> from
> > > > branch-3
> > > > > External Email
> > > > >
> > > > > +1 to remove Hive on Spark in branch-3
> > > > > ---
> > > > > Regards,
> > > > > Naresh P R
> > > > >
> > > > >> On Wed, Feb 22, 2023 at 5:37 AM Sankar Hariappan
> > > > >>  wrote:
> > > > >>
> > > > >> +1, to remove Hive on Spark in branch-3.
> > > > >>
> > > > >> Thanks,
> > > > >> Sankar
> > > > >>
> > > > >> -Original Message-
> > > > >> From: Rajesh Balamohan 
> > > > >> Sent: Wednesday, February 22, 2023 6:58 PM
> > > > >> To: dev@hive.apache.org
> > > > >> Subject: [EXTERNAL] Re: Proposal to deprecate Hive on Spark from
> > > > branch-3
> > > > >>
> > > > >> +1 on removing Hive on Spark in branch-3.
> > > > >>
> > > > >> It was not done earlier since it was removing a feature in the
> > branch.
> > > > But
> > > > >> if there is enough consensus, we should consider removing it.
> > > > >>
> > > > >> ~Rajesh.B
> > > > >>
> > > > >> On Wed, Feb 22, 2023 at 12:48 PM Aman Raj
> > >  > > > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi team,
> > > > >>>
> > > > >>> We have been trying to fix Hive on Spark test failures for a long
> > > > >>> time. As of now, branch-3 has less than 12 test failures (whose
> fix
> > > > >>> have not been identified). 8 of them are related to Hive on
> Spark.
> > I
> > > > >>> had mailed about the failures in my previous mail threads. Thanks
> > to
> > > > >>> Vihang for working on them as well. But we have not been able to
> > > > >> identify the root cause till now.
> > > > >>> These fixes can be tracked in t

Re: [DISCUSS] Jira Public Signup Disabled

2023-03-03 Thread Stamatis Zampetakis
Thanks for bringing this up Ayush.

Yes we should update the wiki to reflect the new process; if nobody does it
in the following days I will revise the respective page.

We should also consider deleting jira-request mailing list if that's
possible to avoid confusion.

Best,
Stamatis


On Thu, Mar 2, 2023, 8:27 AM Ayush Saxena  wrote:

> Folks,
> New stuff now, INFRA has introduced a new Utility which can be used for
> Jira id creation[1], It is mentioned over here as well in the
> announcement[2] from Infra team.
>
> Guess we should update our contributor docs[3] to reflect that and ask
> folks to route their request via this util.
>
> -Ayush
>
> [1] https://selfserve.apache.org/jira-account.html
> [2] https://infra.apache.org/blog/brand-new-selfserve-page.html
> [3]
> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>
> On Thu, 17 Nov 2022 at 16:43, Stamatis Zampetakis 
> wrote:
>
>> The jira-reque...@hive.apache.org has been created and I added relevant
>> instructions on how to request a JIRA account in the wiki [1]. Feel free to
>> improve as you see fit!
>>
>> Best,
>> Stamatis
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>>
>> On Tue, Nov 15, 2022 at 9:59 PM Stamatis Zampetakis 
>> wrote:
>>
>>> Logged https://issues.apache.org/jira/browse/INFRA-23905 for the
>>> creation of the new mailing list.
>>>
>>> On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri <
>>> achennag...@cloudera.com> wrote:
>>>
>>>> +1, Thank you, Stamatis.
>>>>
>>>> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
>>>> wrote:
>>>>
>>>>> +1, Thanks, Stamatis.
>>>>>
>>>>> -Pravin
>>>>>
>>>>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Due to the large amount of spam account creation the ASF INFRA team
>>>>>> has disabled the JIRA account creation [1].
>>>>>>
>>>>>> From the 11th of November, contributors who wish to have a JIRA
>>>>>> account (to create, assign, watch, etc issues) will need to request an
>>>>>> account through an ASF PMC.
>>>>>>
>>>>>> Other projects, such as Calcite, have already taken the necessary
>>>>>> actions to streamline the process for new contributors [2].
>>>>>>
>>>>>> I would suggest drawing inspiration from Calcite and take similar
>>>>>> actions in Hive.
>>>>>>
>>>>>> If you all agree we can start by creating a dedicated (private)
>>>>>> mailing lists for such requests:
>>>>>> jira-reque...@hive.apache.org
>>>>>>
>>>>>> and then proceed with a brief documentation of the process in the
>>>>>> wiki or website.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> Best,
>>>>>> Stamatis
>>>>>>
>>>>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>>>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>>>>
>>>>>


Re: [DISCUSS] Jira Public Signup Disabled

2023-03-06 Thread Stamatis Zampetakis
I just updated the wiki [1] pointing to the new account creation form:
https://selfserve.apache.org/jira-account.html

I also logged INFRA-24306 [2] for the deletion of
jira-reque...@hive.apache.org mailing list.

Best,
Stamatis

[1]
https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
[2] https://issues.apache.org/jira/browse/INFRA-24306

On Fri, Mar 3, 2023 at 9:31 AM Stamatis Zampetakis 
wrote:

> Thanks for bringing this up Ayush.
>
> Yes we should update the wiki to reflect the new process; if nobody does
> it in the following days I will revise the respective page.
>
> We should also consider deleting jira-request mailing list if that's
> possible to avoid confusion.
>
> Best,
> Stamatis
>
>
> On Thu, Mar 2, 2023, 8:27 AM Ayush Saxena  wrote:
>
>> Folks,
>> New stuff now, INFRA has introduced a new Utility which can be used for
>> Jira id creation[1], It is mentioned over here as well in the
>> announcement[2] from Infra team.
>>
>> Guess we should update our contributor docs[3] to reflect that and ask
>> folks to route their request via this util.
>>
>> -Ayush
>>
>> [1] https://selfserve.apache.org/jira-account.html
>> [2] https://infra.apache.org/blog/brand-new-selfserve-page.html
>> [3]
>> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>>
>> On Thu, 17 Nov 2022 at 16:43, Stamatis Zampetakis 
>> wrote:
>>
>>> The jira-reque...@hive.apache.org has been created and I added relevant
>>> instructions on how to request a JIRA account in the wiki [1]. Feel free to
>>> improve as you see fit!
>>>
>>> Best,
>>> Stamatis
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-JIRA
>>>
>>> On Tue, Nov 15, 2022 at 9:59 PM Stamatis Zampetakis 
>>> wrote:
>>>
>>>> Logged https://issues.apache.org/jira/browse/INFRA-23905 for the
>>>> creation of the new mailing list.
>>>>
>>>> On Tue, Nov 15, 2022 at 9:57 PM Abhay Chennagiri <
>>>> achennag...@cloudera.com> wrote:
>>>>
>>>>> +1, Thank you, Stamatis.
>>>>>
>>>>> On Tue, Nov 15, 2022 at 12:42 PM Pravin Sinha 
>>>>> wrote:
>>>>>
>>>>>> +1, Thanks, Stamatis.
>>>>>>
>>>>>> -Pravin
>>>>>>
>>>>>> On Tue, Nov 15, 2022 at 5:57 PM Stamatis Zampetakis <
>>>>>> zabe...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Due to the large amount of spam account creation the ASF INFRA team
>>>>>>> has disabled the JIRA account creation [1].
>>>>>>>
>>>>>>> From the 11th of November, contributors who wish to have a JIRA
>>>>>>> account (to create, assign, watch, etc issues) will need to request an
>>>>>>> account through an ASF PMC.
>>>>>>>
>>>>>>> Other projects, such as Calcite, have already taken the necessary
>>>>>>> actions to streamline the process for new contributors [2].
>>>>>>>
>>>>>>> I would suggest drawing inspiration from Calcite and take similar
>>>>>>> actions in Hive.
>>>>>>>
>>>>>>> If you all agree we can start by creating a dedicated (private)
>>>>>>> mailing lists for such requests:
>>>>>>> jira-reque...@hive.apache.org
>>>>>>>
>>>>>>> and then proceed with a brief documentation of the process in the
>>>>>>> wiki or website.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> Best,
>>>>>>> Stamatis
>>>>>>>
>>>>>>> [1] https://blogs.apache.org/infra/entry/jira-public-signup-disabled
>>>>>>> [2] https://lists.apache.org/thread/5odg6wyvwfkryk96ls2w3vxnrkftw50s
>>>>>>>
>>>>>>


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-10 Thread Stamatis Zampetakis
Hi Kirti,

Thanks for bringing up this topic.

The master branch already has many new features; we don't need to wait for
more to cut a GA.

The main criterion for going GA is stability thus I would consider
regressions as the only blockers for the release.

If I recall well the only regressions discovered so far are some problems
with TPC-DS queries so basically HIVE-26654 [1].

I will let others chime in to include more tickets if necessary.

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Wed, Mar 8, 2023 at 10:02 AM Kirti Ruge  wrote:

> Hello Hive Dev,
>
> It has been about 6 months since Hive-4.0-alpha-2 was released in Nov 2022.
> Would it be a good time to discuss about HIVE-4.0 GA  release to the
> community ? Can we have discussion on the new features/jdk support versions
> which we want to publish as part of 4.0 GA , timeframe of release.
>
>
> Thanks,
> Kirti


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-13 Thread Stamatis Zampetakis
Hi Kirti,

>From the tickets you shared, the only one that I would consider a blocker
is HIVE-26220.

Assuming that we fix HIVE-26220 in the coming weeks can someone from
downstream projects test things out based on the nightly builds?

If nobody is willing to test the fix for HIVE-26220 then we could lower the
priority till there is actual interest.

Best,
Stamatis

On Sun, Mar 12, 2023 at 9:24 AM Kirti Ruge  wrote:

> Thanks Stamatis !!!
> I see below JIRAs marked with label hive-4.0.0-must <
> https://issues.apache.org/jira/issues/?jql=labels+%3D+hive-4.0.0-must>
> and in unresolved status.
>
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=564 <
> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=564>
>
>
> Thanks,
> Kirti
>
> HIVE-26400 <https://issues.apache.org/jira/browse/HIVE-26400>
> Provide docker images for Hive - PR in review
> https://github.com/apache/hive/pull/3448 <
> https://github.com/apache/hive/pull/3448>
>
> HIVE-26537 <https://issues.apache.org/jira/browse/HIVE-26537>
> Deprecate older APIs in the HMS. -  PR in review
> https://github.com/apache/hive/pull/3599
>
> HIVE-26220 <https://issues.apache.org/jira/browse/HIVE-26220>
> Shade & relocate dependencies in hive-exec to avoid conflicting with
> downstream projects
>
> HIVE-26644 <https://issues.apache.org/jira/browse/HIVE-26644>
> Introduce auto sizing in HMS -   stale PR.
> https://github.com/apache/hive/pull/3683
>
>
>
> > On 10-Mar-2023, at 10:20 PM, Stamatis Zampetakis 
> wrote:
> >
> > Hi Kirti,
> >
> > Thanks for bringing up this topic.
> >
> > The master branch already has many new features; we don't need to wait
> for
> > more to cut a GA.
> >
> > The main criterion for going GA is stability thus I would consider
> > regressions as the only blockers for the release.
> >
> > If I recall well the only regressions discovered so far are some problems
> > with TPC-DS queries so basically HIVE-26654 [1].
> >
> > I will let others chime in to include more tickets if necessary.
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26654
> >
> >
> > On Wed, Mar 8, 2023 at 10:02 AM Kirti Ruge 
> wrote:
> >
> >> Hello Hive Dev,
> >>
> >> It has been about 6 months since Hive-4.0-alpha-2 was released in Nov
> 2022.
> >> Would it be a good time to discuss about HIVE-4.0 GA  release to the
> >> community ? Can we have discussion on the new features/jdk support
> versions
> >> which we want to publish as part of 4.0 GA , timeframe of release.
> >>
> >>
> >> Thanks,
> >> Kirti
>
>


Re: [DISCUSS] Incremental and cadence predictable release activity for HIVE

2023-03-13 Thread Stamatis Zampetakis
Hello,

I am not sure what a branch cut actually refers to. As I mentioned in the
past I am not in favor of maintaining multiple release branches; the cost
is high and the number of volunteers is simply not enough. I am willing to
reconsider if things change in the near future.

Apart from that, having frequent releases from master is definitely great
for consumers  and good for the health of the project; two, three releases
per year would be great but for this to happen we need volunteers (mostly
release managers).

One thing that I have seen working well in other projects is to decide in
advance the next 3-4 release managers. Maybe it's worth trying implementing
this in Hive.

Best,
Stamatis

On Sun, Mar 12, 2023 at 6:07 PM Ayush Saxena  wrote:

> Hi Kirti,
> Thanx for the initiative. This sounds very interesting, but I doubt if it
> is that easy to incorporate. Sharing my thoughts:
>
>- Regarding "Unpredictable" : I don't think we are like doing very
>unpredictable releases. It should be a formal mail, like Release x.y.z
> and
>then the RM usually shares a potential Branch freeze date, then a
>margin number of days for blockers or critical tickets. And this entire
>process would be around a minimum of 1 month and usually will go around
> 3
>months.
>- Regarding "Regressions": Quicker releases doesn't certainly mean more
>stable releases.
>- Regarding half-baked features: We are mostly developing on master
>branch, we don't have a concept of feature branch(a lot of projects have
>that), So, if a bunch of features are running in parallel by different
> set
>of people, with a "fixed" date it is practically impossible to achieve,
>this thing needs to be negotiated b/w all of them.
>- Even if we pin a date, that ain't sufficient, we need volunteers who
>can take up the RM role, If we proceed with this we should decide the
> RM as
>well beforehand.
>- This timeline thing can get screwed up in case you hit a security
>issue: AFAIK you can't announce a CVE unless you have a release on all
>active release lines with the fix. So, in that case this schedule will
> get
>messed up and the RM, the dates would require to be renegotiated.
>- Sometimes you need to release early because a downstream project needs
>a fix, which blocks their way to upgrade Hive. Standard practice, almost
>All apache projects are concerned about each other and help others in
>upgrading, so in that case I am not sure holding them for a fixed date
> is
>cool or not
>- Mostly what I have observed, A release takes place when we have enough
>tickets to release, We don't want to just keep on releasing with just
> 20-25
>fixes, nor we want to push straight 800-900 fixes in one go. The number
> of
>fixes, the nature of fixes all should be taken in account while planning
>the release date.
>
>
> In general: Good Idea, We should definitely encourage more frequent
> releases, having a "strict" date or not is debatable.
>
> -Ayush
>
> On Sun, 12 Mar 2023 at 19:44, Kirti Ruge  wrote:
>
> > Hello HIVE Dev,
> >
> > I would like to discuss/propose incremental and cadence predictable
> > process for HIVE releases.
> >
> > https://hive.apache.org/general/downloads/
> >
> > Currently, our releases have a very random span in between, and those
> have
> > sometimes caused problems like-
> >
> > 1. All downstream and end users have unpredictable schedules because of
> > upstream.
> > 2. More chances of regression issues when there is an unplanned release
> > date. As developers and release managers have to rush, this prevents us
> > from focusing on having a proper regression-free release.
> >
> > I would like to propose a branch cut twice a year to have two strict
> > releases yearly. It would make release cadence predictable for end users
> > and bring some disciplinary schedules for all users, including downstream
> > projects.
> >
> > Advantages of this approach-
> >
> > 1. If we pin a branch cut date, features can be prioritized better so
> that
> > no half-baked stuff goes into release.
> > 2. Such Incremental release will help in better regression and reduce the
> > burden from release management activity( result is reduced issues and
> > problems with quality). It will eventually help to streamline release
> > management activity.
> >
> >
> > Let me know your thoughts.
> >
> > Thanks,
> > Kirti
>


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-21 Thread Stamatis Zampetakis
Many thanks for running tests with 4.0.0 Sungwoo; it is invaluable
help for getting out a stable Hive 4.

I will review https://issues.apache.org/jira/browse/HIVE-26968 in the
coming weeks; I have assigned myself as reviewer in the PR.

Can some other people (committers or not) help in reviewing the
remaining TPC-DS blockers for which we have a PR?

Reminder: Good non-binding reviews are important and much appreciated
by the community. They are also among the important metrics for
becoming a Hive committer/PMC [1].

Best,
Stamatis

[1] https://cwiki.apache.org/confluence/display/Hive/BecomingACommitter

On Tue, Mar 14, 2023 at 12:07 PM Sungwoo Park  wrote:
>
> Hello,
>
> I would like to expand the list of blockers with HIVE-27138 [1] which fixes 
> NPE
> on mapjoin_filter_on_outerjoin.q.
>
> Currently mapjoin_filter_on_outerjoin.q is tested with MapReduce execution
> engine and shows no problem. However, it shows a few problems when tested with
> Tez execution engine. HIVE-27138 is the first fix found after analyzing
> mapjoin_filter_on_outerjoin.q, and Seonggon will create a couple more tickets
> later.
>
> In the meanwhile, it would be great if someone could review pull requests for
> subtasks in HIVE-26654. (I moved to HIVE-26654 three tickets that I previously
> requested code review for.)
>
> Best,
>
> --- Sungwoo
>   [1] https://issues.apache.org/jira/browse/HIVE-27138
>
> On Fri, 10 Mar 2023, Stamatis Zampetakis wrote:
>
> > Hi Kirti,
> >
> > Thanks for bringing up this topic.
> >
> > The master branch already has many new features; we don't need to wait for
> > more to cut a GA.
> >
> > The main criterion for going GA is stability thus I would consider
> > regressions as the only blockers for the release.
> >
> > If I recall well the only regressions discovered so far are some problems
> > with TPC-DS queries so basically HIVE-26654 [1].
> >
> > I will let others chime in to include more tickets if necessary.
> >
> > Best,
> > Stamatis
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26654
> >
> >
> > On Wed, Mar 8, 2023 at 10:02?AM Kirti Ruge  wrote:
> >
> >> Hello Hive Dev,
> >>
> >> It has been about 6 months since Hive-4.0-alpha-2 was released in Nov 2022.
> >> Would it be a good time to discuss about HIVE-4.0 GA  release to the
> >> community ? Can we have discussion on the new features/jdk support versions
> >> which we want to publish as part of 4.0 GA , timeframe of release.
> >>
> >>
> >> Thanks,
> >> Kirti
> >


Release managers

2023-03-21 Thread Stamatis Zampetakis
Hi all,

As discussed in another thread [1], it might be a good idea to agree
on the next 4 release managers (RM) beforehand to maintain as much as
possible a stable release cadence.

I can volunteer to be the RM for the next Hive release. Other any
other volunteers for the rest?

4.0.0 Stamatis Zampetakis
4.1.0
4.2.0
4.3.0

The versions are just placeholders for now so we don't need to agree
on the name at this stage.
The important is to have a release from master out every 3-4 months.

Note that you don't need to be a Hive PMC member to prepare a release
candidate. Committers should be able to complete most of the steps
involved in the process [2].

Best,
Stamatis

[1] https://lists.apache.org/thread/bg4g1w75ks11jh273bh3pct81x9brv0c
[2] 
https://cwiki.apache.org/confluence/display/Hive/HowToRelease#HowToRelease-HiveRelease


Re: [DISCUSS] HIVE 4.0 GA Release Proposal

2023-03-25 Thread Stamatis Zampetakis
Regarding correctness, I think it makes sense to change default values and
possibly add a warning note when there's a known risk of wrong results.
Needless to say that we should try to fix as many issues as possible; we
still need volunteers to review open PRS.

Performances regressions are trickier but if we have the query plans (CBO +
full) along with logs (including task counters) for fast and slow execution
we may be able to understand what happens. Don't hesitate to create Jira
tickets with these information if available.

Last regarding 4.0.0 blockers, I don't think we need a special label. The
built-in and widely used priority "blocker" seems enough to capture the
importance and urgency of a ticket.
Since I am the release manager for the next release I will go over tickets
marked as blockers and reevaluate priorities if necessary.

Best,
Stamatis

On Thu, Mar 23, 2023, 10:27 AM Denys Kuzmenko  wrote:

> Thanks, Sungwoo for running the TPC-DS benchmark. Do we know if the same
> level of performance degradation was present in 4.0.0-alpha1?
>
> All: please use the `hive-4.0.0-must` label in a ticket if you think it's
> a show-stopper for the release.
>


[DISCUSS] Move Jira notification emails out of dev@hive

2023-03-25 Thread Stamatis Zampetakis
Hi everyone,

In the last Hive board report someone mentioned that the volume of Jira
notification emails to the dev list is huge especially when compared to
emails send by actual humans making it hard for someone to follow what's
happening in the project.

I personally share their viewpoint. For a long time I have been relying on
client side (Gmail) filters to separate Jira notifications from other
emails to the dev list.

I think it would be better to direct the traffic from jira to a separate
list namely jira@hive to keep the dev@hive list clean and dedicated to
human interaction.

What do you think?

Best,
Stamatis


Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-03-30 Thread Stamatis Zampetakis
I will proceed with the changes needed to move the Jira traffic out of the
dev list sometime next week.

If there are reasons to delay or abandon the proposal please let me know.

Best,
Stamatis

On Mon, Mar 27, 2023, 5:39 AM Sungwoo Park  wrote:

> I like the proposal very much. (Then, hopefully this mailing list will
> be useful to outside contributors as well.)
>
> --- Sungwoo Park
>
> On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
>
> > Hi everyone,
> >
> > In the last Hive board report someone mentioned that the volume of Jira
> > notification emails to the dev list is huge especially when compared to
> > emails send by actual humans making it hard for someone to follow what's
> > happening in the project.
> >
> > I personally share their viewpoint. For a long time I have been relying
> on
> > client side (Gmail) filters to separate Jira notifications from other
> > emails to the dev list.
> >
> > I think it would be better to direct the traffic from jira to a separate
> > list namely jira@hive to keep the dev@hive list clean and dedicated to
> > human interaction.
> >
> > What do you think?
> >
> > Best,
> > Stamatis
> >
>


Re: [EXTERNAL] Re: Branch-3 backports and build stability

2023-03-30 Thread Stamatis Zampetakis
Huge thanks to everyone involved it is great to see the branch-3 in stable
state. As other people mentioned let's keep it that way!

As far as it concerns back ports please be particularly cautious with
anything that touches the metastore schema and Thrift APIs.

Best,
Stamatis

On Wed, Mar 29, 2023, 4:36 AM vihang karajgaonkar 
wrote:

> Thanks a lot Aman for all your efforts on this. Really appreciate the
> initiative and all your hard work on this.
>
> I would like to request that all the committers should follow the merge
> process of master branch to merge PRs in branch-3. If there are any test
> failures which seem unrelated, please do not ignore them. One can run the
> flaky
> test runner <http://ci.hive.apache.org/job/hive-flaky-check/> to make sure
> that test is indeed flaky. If the test is found to be flaky a
> ticket should be created to disable it. A separate ticket should be created
> to deflake it and you can mention the original author or previous commit
> author who changed the test on that ticket to get help since they likely
> have the most context around that test. Once the flaky test is disabled and
> we have a green CI job run, we should merge the PR. If others have any
> suggestions to improve this process please chime in.
>
> Thanks,
> Vihang
>
> On Tue, Mar 28, 2023 at 10:55 PM Aman Raj 
> wrote:
>
> > Hi community,
> >
> > This is to notify that we have a green branch-3 now. The entire effort of
> > fixing branch-3 test cases took around 4 months and as a team we managed
> to
> > fix 2900+ test failures on branch-3. The entire effort can be tracked
> here
> > HIVE-26836<https://issues.apache.org/jira/browse/HIVE-26836>. We are
> > ready to push new features and improvements on branch-3 now.
> >
> > I really want to thank Vihang Karajgaonkar, Chris Nauroth, Lazlo Bodor,
> > Stamatis Zampetakis and Sankar Hariappan without whom this would not at
> all
> > have been possible. As a team we stuck together and participated in
> reviews
> > and actively suggested improvements which really helped in fixing some
> > major test failures.
> >
> > I would sincerely request that going further it should be made a point to
> > merge things into branch-3 only if we have a green Jenkins pipeline.
> >
> > The next step would be to backport changes from branch-3.1 (From where
> > Hive-3.1.3 release was made) to branch-3. This would ensure that we do
> not
> > miss any specific ticket which went into Hive-3.1.3. I will take care of
> > this. We can parallelly start pushing additional changes on branch-3.
> There
> > are approximately 25 tickets that need to be backported in this effort
> (Of
> > backporting changes from branch-3.1). I have made a note here<
> >
> https://docs.google.com/spreadsheets/d/1K0U-vxLRZEs13oBzYBlVyK8dMMNthgXL5VEgzLRbeKs/edit?usp=sharing
> > >
> >
> > Again, thanks a lot to everyone who supported and participated in this
> > effort. Lets make this 3.2.0 Hive release happen!!
> >
> > Thanks,
> > Aman.
> >
> > 
> > From: Aman Raj 
> > Sent: Monday, March 20, 2023 9:21 AM
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > Hi Vihang/community,
> >
> > Found the ticket which broke mm_all.q. This issue comes because of
> > HIVE-20182. Works in my local and on the Jenkins pipeline as well. Link :
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fhive%2Fpull%2F4127&data=05%7C01%7Crajaman%40microsoft.com%7C043f385c28ce4867174208db28f66afd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148811080483635%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XSPlEtfWDNV%2Fccv9Q33xUtMLuhvxHx3CD4kC%2F5mWj2Y%3D&reserved=0
> > <https://github.com/apache/hive/pull/4127> Reverting this commit for
> now.
> >
> > Thanks,
> > Aman.
> > 
> > From: Aman Raj 
> > Sent: Monday, March 20, 2023 8:28 AM
> > To: dev@hive.apache.org 
> > Subject: Re: [EXTERNAL] Re: Branch-3 backports and build stability
> >
> > Sure Vihang, will look at the other ones. You can pick this up.
> >
> > Thanks,
> > Aman.
> >
> > Get Outlook for Android<
> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FAAb9ysg&data=05%7C01%7Crajaman%40microsoft.com%7C043f385c28ce4867174208db28f66afd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638148811080483635%7CUnknown%7CTWFpbGZsb3d8eyJWI

Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-04-07 Thread Stamatis Zampetakis
Just logged https://issues.apache.org/jira/browse/INFRA-24440 to move
this forward.

Best,
Stamatis

On Thu, Mar 30, 2023 at 11:12 AM Stamatis Zampetakis  wrote:
>
> I will proceed with the changes needed to move the Jira traffic out of the 
> dev list sometime next week.
>
> If there are reasons to delay or abandon the proposal please let me know.
>
> Best,
> Stamatis
>
> On Mon, Mar 27, 2023, 5:39 AM Sungwoo Park  wrote:
>>
>> I like the proposal very much. (Then, hopefully this mailing list will
>> be useful to outside contributors as well.)
>>
>> --- Sungwoo Park
>>
>> On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
>>
>> > Hi everyone,
>> >
>> > In the last Hive board report someone mentioned that the volume of Jira
>> > notification emails to the dev list is huge especially when compared to
>> > emails send by actual humans making it hard for someone to follow what's
>> > happening in the project.
>> >
>> > I personally share their viewpoint. For a long time I have been relying on
>> > client side (Gmail) filters to separate Jira notifications from other
>> > emails to the dev list.
>> >
>> > I think it would be better to direct the traffic from jira to a separate
>> > list namely jira@hive to keep the dev@hive list clean and dedicated to
>> > human interaction.
>> >
>> > What do you think?
>> >
>> > Best,
>> > Stamatis
>> >


Re: Lateral views and CBO

2023-04-09 Thread Stamatis Zampetakis
Hi Steve,

The way that we currently represent lateral views on the physical plan is
not great. I'm sure there are good reasons of why people went forward with
the approach of introducing specialized operators for that purpose but as
the history has shown the current representation is causing us lots of
problems in many parts of the compiler where we need to traverse the plan
DAG (rules + hooks + explain) leading to exponential visit of the plan
nodes.

If we decorrelate the plan as usual maybe we could get rid of the problems
mentioned above and the need for specialized lateral view physical
operators which I think would be a good thing in the long run.


Best,
Stamatis

On Thu, Apr 6, 2023, 6:31 PM Stephen Carlin 
wrote:

> Hi,
>
> I noticed recently that for most lateral views, we do not convert to a CBO
> plan.  I was hoping to make some changes to make this possible.
>
> I was wondering if anyone out there had any thoughts on how to do this.
>
> If not, I did have one in mind and wanted to bring it up just to start the
> conversation before I did any work on it.
>
> I was thinking maybe we could have our own RelNode deriving off of
> LogicalCorrelate.  I know we already handle correlated queries and
> decorrelate them.  But in this special case, I think we should not
> decorrelate the RelNode.  We should leave this new LogicalCorrelate node in
> through the whole optimization process and then put in code to translate
> the RelNode into the physical plan.
>
> Thoughts?
>
> Thanks!
>


Re: Introducing a DI framework in Hive?

2023-04-12 Thread Stamatis Zampetakis
Hey Laszlo,

Dependency injection is a very powerful and useful tool/design pattern.

I don't think there is a particular reason for which Hive does not use
DI framework apart maybe from the fact that we have lots of legacy
code that existed before DI became that popular.

I am open to ideas and suggestions about parts of the code that we
could improve via DI. I would probably avoid big refactorings to core
components of Hive for the sake of introducing a DI framework but I
see no big issue using such frameworks in new code. As usual when we
are about to introduce a new dependency to the project we should be
mindful of all the implications that this might have.

It's hard to make a generally applicable claim that we should use this
or that framework since I guess it has to do a lot with personal
preferences; we tend to prefer things that we have already used. I
haven't used DI frameworks that much so don't have a strong opinion on
which framework is the best so I am willing to follow the majority.

Best,
Stamatis

On Tue, Apr 4, 2023 at 1:19 PM Laszlo Vegh  wrote:
>
>
> Hi all,
>
> I would like to start a conversation about introducing some Dependency 
> Injection framework (like Spring, Guice, Weld, etc.) in Hive.
>
> IMHO the lack of such framework makes the codebase way less organised, and 
> harder to maintain. Moreover, I think it also lead to introducing a huge 
> amount of static/utility methods and classes (which is highly discouraged 
> when using DI frameworks). When there is no DI framework, utility classes 
> with static methods often seem to be the simplest and best way to share code 
> across different Hive components/classes, but these constructs are really 
> killing testability. For example it is much harder to mock static method 
> calls, than mocking service/component instances. Poor testability is a major 
> issue on its own, but having a DI framework could have much more benefit, 
> like greater flexibility (modularity), better organised services, etc.
>
>
> I’m interested if there’s any reason why there is no DI in Hive so far. I 
> know there’s no way to introduce it everywhere in a single step, but we could 
> start using it where it is easy to start, and continuously expand its usage 
> from class to class. If there is no strong reason why no to do it, I would 
> like to start an open conversation around this topic. (Possible benefits, 
> drawbacks, which framework to use, where to introduce it first, etc.)
>
> If anybody is interested in this initiative, please join the conversation, 
> and add your thoughts, ideas, doubts, anything.
>
> Thanks,
>
> Laszlo Vegh
> veghlac...@gmail.com 


Re: [DISCUSS] Move Jira notification emails out of dev@hive

2023-04-12 Thread Stamatis Zampetakis
INFRA-24440 is resolved so all JIRA traffic now goes to issues@hive.
Don't forget to subscribe to that list if you wish to follow the
creation of new tickets etc.

Best,
Stamatis

On Fri, Apr 7, 2023 at 9:55 AM Stamatis Zampetakis  wrote:
>
> Just logged https://issues.apache.org/jira/browse/INFRA-24440 to move
> this forward.
>
> Best,
> Stamatis
>
> On Thu, Mar 30, 2023 at 11:12 AM Stamatis Zampetakis  
> wrote:
> >
> > I will proceed with the changes needed to move the Jira traffic out of the 
> > dev list sometime next week.
> >
> > If there are reasons to delay or abandon the proposal please let me know.
> >
> > Best,
> > Stamatis
> >
> > On Mon, Mar 27, 2023, 5:39 AM Sungwoo Park  wrote:
> >>
> >> I like the proposal very much. (Then, hopefully this mailing list will
> >> be useful to outside contributors as well.)
> >>
> >> --- Sungwoo Park
> >>
> >> On Sat, 25 Mar 2023, Stamatis Zampetakis wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > In the last Hive board report someone mentioned that the volume of Jira
> >> > notification emails to the dev list is huge especially when compared to
> >> > emails send by actual humans making it hard for someone to follow what's
> >> > happening in the project.
> >> >
> >> > I personally share their viewpoint. For a long time I have been relying 
> >> > on
> >> > client side (Gmail) filters to separate Jira notifications from other
> >> > emails to the dev list.
> >> >
> >> > I think it would be better to direct the traffic from jira to a separate
> >> > list namely jira@hive to keep the dev@hive list clean and dedicated to
> >> > human interaction.
> >> >
> >> > What do you think?
> >> >
> >> > Best,
> >> > Stamatis
> >> >


Re: Introducing a DI framework in Hive?

2023-04-13 Thread Stamatis Zampetakis
Just to be clear, I am in favor of introducing DI frameworks in Hive
where it makes sense. As Attila said, we don't want to get stuck with
legacy code forever. When a concrete proposal comes up we can discuss
benefits vs drawbacks.

Regarding stability I agree it is a pressing issue but Hive is an open
source project and we certainly don't want to force volunteers to work
on specific things or forbid them to work on others. Contributing to
open source is supposed to be a fun and rewarding experience. I am
sure many of the people in this list have stability as a primary goal
so eventually we will get there.

Best,
Stamatis


Re: Introducing a DI framework in Hive?

2023-04-19 Thread Stamatis Zampetakis
I think we all agree that DI can be beneficial in general.

However, it's hard to say yes or no on something before having a
concrete case to discuss; it doesn't have to be a PR but we need to
work on a specific Hive use-case and list advantages/disadvantages of
the proposal.

Best,
Stamatis

On Mon, Apr 17, 2023 at 7:33 PM Laszlo Vegh  wrote:
>
> Hi all,
>
> Sorry for not answering for so far, for some reason I did not receive your 
> answers in my gmail account. I’m happy to see that there’s a conversation 
> around the topic, so let me add my opinion on your points.
>
> First of all, introducing a DI framework does not mean a large scale 
> refactoring. A suitable module, or a well-bounded set of components can be 
> chosen as the first candidate. It’s also important that nobody will be forced 
> to utilise the DI container when writing features, or to redesign existing 
> code when it is being touched.
> As for the aim: I’ve worked quite a lot with Java and .Net DI frameworks, and 
> my experience was that having a DI framework greatly reduces the effort to 
> write well organised and maintainable code. While well organised code can be 
> written without DI frameworks too, the lack of such framework makes it much 
> more easier to write poorly designed code (bad scoping, lifecycle issues, 
> visibility issues, etc). On well-organised I mean:
> Design patterns: DI containers make it easier to write code using the well 
> known design patterns. For example you can implement factory, wrapper, 
> adapter, etc patterns by simply using the offered features as it is supposed 
> to do.
> Streamlined component initialisation: No more spaghetti/boilerplate component 
> init methods
> Well defined component scopes (lifecycle): DI frameworks support various 
> component scopes, which offers a fine grained control over component 
> lifecylce -> Singleton, one component per thread, one component per request 
> from DI container, etc.
> Organised and visible component/class dependencies: Through constructor 
> injection all the dependencies of a class are visible (unlike static method 
> calls). Using this approach it is impossible to create circular dependencies 
> which lead to object initialisation issues and hacks. By requiring all deps 
> during object creation it’s way easier to detect or avoid unwanted 
> dependencies. It also makes easier to better organise the code into packages 
> and modules
> Enhanced testability: I have explained this earlier.
> Well defined component visibility: No need for “union-all” context objects. 
> Instead of having context objects with references for all of the components 
> which may required during the execution, each execution step can obtain the 
> necessary dependencies from the DI container. Also, no more public static 
> methods, or class instances. In order to let some component accessible from 
> everywhere, there’s no need to make it public and static. DI frameworks also 
> offer nested/sub contexts to limit/control visibility.
> My original mail was supposed to be a kickoff, to start talking about DI. 
> Before creating a PR with an example in Hive, I would like to have a common 
> agreement that we want to do this, and there is no blocker which prevents us 
> from doing it. Once we have this agreement I can create a working example and 
> demonstrate how it will help us in the future.
> Regarding the stability and performance issues: Of course those must be 
> addressed as well, but as Stamatis pointed out, Hive is an open source 
> project and everybody can have its own initiative in parallel to the others’.
>
> In Java I have the most experience with Spring, so I would prefer choosing 
> it. It became huge by now, but it’s modular. We are not forced to use all of 
> the offered features, if we want a pure DI container with some basic 
> extensions, we would only need spring-core, spring-beans, and spring-context. 
> It has several extensions and supports tons of other well known frameworks 
> and/or technologies.
>
> Best regards,
> Laszlo Vegh


  1   2   3   4   5   >