[jira] [Created] (HIVE-23615) Null pointers should not be dereferenced

2020-06-04 Thread kvlasov (Jira)
kvlasov created HIVE-23615:
--

 Summary: Null pointers should not be dereferenced
 Key: HIVE-23615
 URL: https://issues.apache.org/jira/browse/HIVE-23615
 Project: Hive
  Issue Type: Bug
Reporter: kvlasov


[This pull request|https://github.com/apache/hive/pull/62] is focused on 
resolving occurrences of Sonar rule squid:S2259 - Null pointers should not be 
dereferenced



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Replace ptest with hive-test-kube

2020-06-04 Thread Mustafa IMAN
Thank you Zoltan for all this work.
I see many PRs are merged based on the new workflow already. The old
workflow generates many reports like ASF license/findbugs/checkstyle etc. I
don't see these in the new Github PR workflow. I am concerned the codebase
is going to suffer from lack of these reports very quickly. Do these checks
still happen but are not visible?

On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich  wrote:

> Hello,
>
> I would like to note that you may login to the jenkins instance - to
> start/kill builds (or create new jobs).
> I've configured github oauth - but since team membership can't be queried
> from the "apache organization" - it's harder to configure all "hive
> committers".
> However...I think I've made it available for most of us - if you can't
> start builds/etc just let me know your github user and I'll add it.
>
> cheers,
> Zoltan
>
>
>
> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
> > Hey all!
> >
> > The patch is now in master - so every new PR or a push on it will
> trigger a new run.
> >
> > Please decide which one would you like to use - open a PR to see the new
> one work...or upload a patch file to the jira - but please don't do both;
> because in that case 2
> > execution will happen.
> >
> > The job execution time(2-4 hours) of a single run is a bit higher than
> the usual on the ptest server - this is mostly to increase throughput.
> >
> > The patch also disabled a set of tests; I will send the full list of
> skipped tests shortly.
> >
> > cheers,
> > Zoltan
> >
> >
> > On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
> >> Hello all!
> >>
> >> The new stuff is ready to be switched on-to. It needs to be merged into
> master - and after that anyone who opens a PR will get a run by the new
> HiveQA infra.
> >> I propose to run the 2 systems side-by-side for some time - the regular
> master builds will start; and we will see how frequently that is polluted
> by flaky tests.
> >>
> >> Note that the current patch also disables around ~25 more tests to
> increase stability - to get a better overview about the disabled tests I
> think the "direction of the
> >> information flow" should be altered; what I mean by that is: instead of
> just throwing in a jira for "disable test x" and opening a new one like
> "fix test x"; only open
> >> the latter and place the jira reference into the ignore message;
> meanwhile also add a regular report about the actually disabled tests - so
> people who do know about the
> >> importance of a particular test can get involved.
> >>
> >> Note: the builds.apache.org instance will be shutdown somewhere in the
> future as well...but I think the new one is a good-enough alternative to
> not have to migrate the
> >> Hive-precommit job over to https://ci-hadoop.apache.org/.
> >>
> >> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
> >> https://issues.apache.org/jira/browse/HIVE-22942
> >> https://github.com/apache/hive/pull/948/files
> >>
> >> cheers,
> >> Zoltan
> >>
> >> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
> >>> Hey!
> >>>
> >>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>  Thank you for all of your efforts, this looks really promising. With
> moving
>  to github PRs, would that also mean that we move away from the
> reviewboard
>  for code review?
> >>> I didn't thinked about that. I think using github's review interface
> will remain optional, because both review systems has there own strong
> points - I wouldn't force
> >>> anyone to use one over the other. (For some patches reviewboard is
> much better; because it's able to track content moves a bit better than
> github. - meanwhile github has
> >>> a small feature that enables to mark files as reviewed)
> >>> As a matter of fact we had sometimes patches on the jira's which never
> had neither an RB or a PR to review them - having a PR there at least will
> make it easier for
> >>> reviewers to comment.
> >>>
>  Also, what happens if a PR is updated? Will the tests run for both or
> just
>  for the latest version?
> >>> It will trigger a new build - if there is already a build in progress
> that will prevent a new build from starting until it finishes...and there
> is also a 5 builds/day
> >>> limit; which might induce some wait.
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> 
>  Regards,
>  Zoltan
> 
>  On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich 
> wrote:
> 
> > Hello all!
> >
> > The proposed system have become more stable lately - and I think I've
> > solved a few sources of flakiness.
> > To be really usable I also wanted to add a way to dynamically
> > enable/disable a set of tests (for example the replication tests
> take ~7
> > hours to execute from the total of 24
> > hours - and they are also a bit unstable, so not running them when
> not
> > neccesary would be beneficial in multiple ways) - but to do this the
> best
> > would be to throw in
> > junit5; unfortunately the current ptest 

Re: Open old PRs

2020-06-04 Thread David Mollitor
Hello Zoltan

I just got access.  Not sure if this was a manual process performed by
someone on this thread or something automated, but I got an email
acknowledging.

I'm starting to close some old PRs.

Thanks.

On Thu, Jun 4, 2020 at 5:20 PM David Mollitor  wrote:

> Zoltan,
>
>
> https://cwiki.apache.org/confluence/display/OPENWHISK/Accessing+Apache+GitHub+as+a+Committer
>
>
> That did it for me.  Thanks so much,... I'll try to remember to get this
> in our Hive docs.
>
> I still however cannot close or merge anything.  I get the warning:
>
> "Only those with write access to this repository can merge pull requests."
>
> Any idea how to get my account write access?
>
> Thanks.
>
> On Thu, Jun 4, 2020 at 12:14 PM David Mollitor  wrote:
>
>> Hey Zoltan,
>>
>> Thanks again for putting this together.
>>
>> The original topic was dealing with a big pile of PRs.  I'm starting to
>> work through them and pair them down a bit before we kick in something
>> automated (agree that we should eventually).  However, I'm hampered by my
>> ability to not be able to merge things myself.
>>
>> Thanks.
>>
>> On Thu, Jun 4, 2020 at 11:52 AM Zoltan Haindrich  wrote:
>>
>>> Hey David!
>>>
>>>  > I'm trying to run the tests again and hopefully commit.  Github has me
>>>  > listed as a "contributor" and lists Zoltan as a "Member of The Apache
>>>  > Software Foundation".  Do you know how that list of members is
>>> managed?
>>>
>>> I don't know; maybe you should re-link your account? You could also try
>>> to follow this wiki page:
>>>
>>> https://cwiki.apache.org/confluence/display/OPENWHISK/Accessing+Apache+GitHub+as+a+Committer
>>> It was not my intention to move the patch approval process entirely to
>>> github - it's just an option.
>>> A balanced solution could be that we still upload the patch to the jira
>>> to have it there as well - and continue commiting by pushing it manually.
>>> I feel that this thread have deviated from it's original topic: I saw is
>>> that the big pile of open PRs may backfire on the system if the jobs
>>> "branch scanner" is triggered.
>>>
>>>  >> Regarding auto-close in JIRA.  Take a look at Apache Avro project.
>>> I've
>>>  >> contributed there a little bit and I think they have this capability.
>>>
>>> Yes; it would make sense to auto-close jira tickets basaed on PR-s; but
>>> it will pull up a lot of small questions(like what should it set as fix
>>> version?)
>>> Right now I don't see it as a big deal; I think that flaky tests are a
>>> bigger problem - and there is also some kind of kubernetes issue from
>>> time-to-time - which takes down
>>> a full executor node with all its pods for no apparent reason. I'm
>>> putting my efforts into fixing that...
>>>
>>> chers,
>>> Zoltan
>>>
>>>
>>> On 6/3/20 3:15 PM, David Mollitor wrote:
>>> > Hmm, OK, working through this PR:
>>> >
>>> > https://github.com/apache/hive/pull/1052
>>> >
>>> > I'm trying to run the tests again and hopefully commit.  Github has me
>>> > listed as a "contributor" and lists Zoltan as a "Member of The Apache
>>> > Software Foundation".  Do you know how that list of members is managed?
>>> >
>>> > https://github.com/orgs/apache/people?query=
>>> >
>>> >   Thanks.
>>> >
>>> > On Wed, Jun 3, 2020 at 9:08 AM David Mollitor 
>>> wrote:
>>> >
>>> >> Hello Zoltan,
>>> >>
>>> >> Regarding auto-close in JIRA.  Take a look at Apache Avro project.
>>> I've
>>> >> contributed there a little bit and I think they have this capability.
>>> >>
>>> >> Thanks.
>>> >>
>>> >> On Tue, Jun 2, 2020 at 8:00 PM Stamatis Zampetakis >> >
>>> >> wrote:
>>> >>
>>> >>> Hello,
>>> >>>
>>> >>> I am very happy working with the new system. Many thanks Zoltan!
>>> >>>
>>> >>> I find the bot a good idea and I think its worth trying it out.
>>> >>> One thing to watch out is the case where contributors are willing to
>>> push
>>> >>> their work forward but there are no available reviewers to look to
>>> each
>>> >>> case.
>>> >>> I think people will reply to the bot once or twice but I don't think
>>> they
>>> >>> will do it much longer so we could take this into account for the
>>> >>> configuration of the bot.
>>> >>>
>>> >>> Regarding merge squash option there might be a small caveat. I don't
>>> know
>>> >>> if it is possible to retain the information about the person who
>>> performed
>>> >>> the merge.
>>> >>> According to the discussion in [1] it seems that the committer in
>>> this
>>> >>> case
>>> >>> will appear to be the GitHub account.
>>> >>> This might not be a big problem for Hive since the reviewer's name
>>> is part
>>> >>> of the commit message so the credit and responsibility is not lost.
>>> >>>
>>> >>> Best,
>>> >>> Stamatis
>>> >>>
>>> >>> [1] https://github.com/isaacs/github/issues/1303
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:
>>> >>>
>>> 
>>> 
>>>  On 6/2/20 9:15 PM, David Mollitor wrote:
>>> > I use a personal account for GitHub and it's not synced with my
>>> >>> 

Re: Open old PRs

2020-06-04 Thread David Mollitor
Zoltan,

https://cwiki.apache.org/confluence/display/OPENWHISK/Accessing+Apache+GitHub+as+a+Committer


That did it for me.  Thanks so much,... I'll try to remember to get this in
our Hive docs.

I still however cannot close or merge anything.  I get the warning:

"Only those with write access to this repository can merge pull requests."

Any idea how to get my account write access?

Thanks.

On Thu, Jun 4, 2020 at 12:14 PM David Mollitor  wrote:

> Hey Zoltan,
>
> Thanks again for putting this together.
>
> The original topic was dealing with a big pile of PRs.  I'm starting to
> work through them and pair them down a bit before we kick in something
> automated (agree that we should eventually).  However, I'm hampered by my
> ability to not be able to merge things myself.
>
> Thanks.
>
> On Thu, Jun 4, 2020 at 11:52 AM Zoltan Haindrich  wrote:
>
>> Hey David!
>>
>>  > I'm trying to run the tests again and hopefully commit.  Github has me
>>  > listed as a "contributor" and lists Zoltan as a "Member of The Apache
>>  > Software Foundation".  Do you know how that list of members is managed?
>>
>> I don't know; maybe you should re-link your account? You could also try
>> to follow this wiki page:
>>
>> https://cwiki.apache.org/confluence/display/OPENWHISK/Accessing+Apache+GitHub+as+a+Committer
>> It was not my intention to move the patch approval process entirely to
>> github - it's just an option.
>> A balanced solution could be that we still upload the patch to the jira
>> to have it there as well - and continue commiting by pushing it manually.
>> I feel that this thread have deviated from it's original topic: I saw is
>> that the big pile of open PRs may backfire on the system if the jobs
>> "branch scanner" is triggered.
>>
>>  >> Regarding auto-close in JIRA.  Take a look at Apache Avro project.
>> I've
>>  >> contributed there a little bit and I think they have this capability.
>>
>> Yes; it would make sense to auto-close jira tickets basaed on PR-s; but
>> it will pull up a lot of small questions(like what should it set as fix
>> version?)
>> Right now I don't see it as a big deal; I think that flaky tests are a
>> bigger problem - and there is also some kind of kubernetes issue from
>> time-to-time - which takes down
>> a full executor node with all its pods for no apparent reason. I'm
>> putting my efforts into fixing that...
>>
>> chers,
>> Zoltan
>>
>>
>> On 6/3/20 3:15 PM, David Mollitor wrote:
>> > Hmm, OK, working through this PR:
>> >
>> > https://github.com/apache/hive/pull/1052
>> >
>> > I'm trying to run the tests again and hopefully commit.  Github has me
>> > listed as a "contributor" and lists Zoltan as a "Member of The Apache
>> > Software Foundation".  Do you know how that list of members is managed?
>> >
>> > https://github.com/orgs/apache/people?query=
>> >
>> >   Thanks.
>> >
>> > On Wed, Jun 3, 2020 at 9:08 AM David Mollitor 
>> wrote:
>> >
>> >> Hello Zoltan,
>> >>
>> >> Regarding auto-close in JIRA.  Take a look at Apache Avro project.
>> I've
>> >> contributed there a little bit and I think they have this capability.
>> >>
>> >> Thanks.
>> >>
>> >> On Tue, Jun 2, 2020 at 8:00 PM Stamatis Zampetakis 
>> >> wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> I am very happy working with the new system. Many thanks Zoltan!
>> >>>
>> >>> I find the bot a good idea and I think its worth trying it out.
>> >>> One thing to watch out is the case where contributors are willing to
>> push
>> >>> their work forward but there are no available reviewers to look to
>> each
>> >>> case.
>> >>> I think people will reply to the bot once or twice but I don't think
>> they
>> >>> will do it much longer so we could take this into account for the
>> >>> configuration of the bot.
>> >>>
>> >>> Regarding merge squash option there might be a small caveat. I don't
>> know
>> >>> if it is possible to retain the information about the person who
>> performed
>> >>> the merge.
>> >>> According to the discussion in [1] it seems that the committer in this
>> >>> case
>> >>> will appear to be the GitHub account.
>> >>> This might not be a big problem for Hive since the reviewer's name is
>> part
>> >>> of the commit message so the credit and responsibility is not lost.
>> >>>
>> >>> Best,
>> >>> Stamatis
>> >>>
>> >>> [1] https://github.com/isaacs/github/issues/1303
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:
>> >>>
>> 
>> 
>>  On 6/2/20 9:15 PM, David Mollitor wrote:
>> > I use a personal account for GitHub and it's not synced with my
>> >>> official
>> > Apache account.  How do I go about registering my Apache account
>> with
>> > GitHub so I can merge through their interface?
>> 
>>  IIRC I've linked my account by using this interface:
>>  https://gitbox.apache.org/setup/
>> 
>> >
>> > In the meanwhile, can you assist with a merge here? :)
>> >
>> 
>>  sure; I think you should also add dmolli...@apache.org as a
>> 

[jira] [Created] (HIVE-23614) Always pass HiveConfig to removeTempOrDuplicateFiles

2020-06-04 Thread John Sherman (Jira)
John Sherman created HIVE-23614:
---

 Summary: Always pass HiveConfig to removeTempOrDuplicateFiles
 Key: HIVE-23614
 URL: https://issues.apache.org/jira/browse/HIVE-23614
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: John Sherman
Assignee: John Sherman


As part of HIVE-23354, we check the provided HiveConf for speculative execution 
and throw an error if it is enabled. There is one path that did not previously 
provide a HiveConf value which shows up in test failures in 
runtime_skewjoin_mapjoin_spark.q, skewjoin.q and skewjoin_onesideskew.q (which 
we do not run as part of pre-commit tests).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Open old PRs

2020-06-04 Thread David Mollitor
Hey Zoltan,

Thanks again for putting this together.

The original topic was dealing with a big pile of PRs.  I'm starting to
work through them and pair them down a bit before we kick in something
automated (agree that we should eventually).  However, I'm hampered by my
ability to not be able to merge things myself.

Thanks.

On Thu, Jun 4, 2020 at 11:52 AM Zoltan Haindrich  wrote:

> Hey David!
>
>  > I'm trying to run the tests again and hopefully commit.  Github has me
>  > listed as a "contributor" and lists Zoltan as a "Member of The Apache
>  > Software Foundation".  Do you know how that list of members is managed?
>
> I don't know; maybe you should re-link your account? You could also try to
> follow this wiki page:
>
> https://cwiki.apache.org/confluence/display/OPENWHISK/Accessing+Apache+GitHub+as+a+Committer
> It was not my intention to move the patch approval process entirely to
> github - it's just an option.
> A balanced solution could be that we still upload the patch to the jira to
> have it there as well - and continue commiting by pushing it manually.
> I feel that this thread have deviated from it's original topic: I saw is
> that the big pile of open PRs may backfire on the system if the jobs
> "branch scanner" is triggered.
>
>  >> Regarding auto-close in JIRA.  Take a look at Apache Avro project.
> I've
>  >> contributed there a little bit and I think they have this capability.
>
> Yes; it would make sense to auto-close jira tickets basaed on PR-s; but it
> will pull up a lot of small questions(like what should it set as fix
> version?)
> Right now I don't see it as a big deal; I think that flaky tests are a
> bigger problem - and there is also some kind of kubernetes issue from
> time-to-time - which takes down
> a full executor node with all its pods for no apparent reason. I'm putting
> my efforts into fixing that...
>
> chers,
> Zoltan
>
>
> On 6/3/20 3:15 PM, David Mollitor wrote:
> > Hmm, OK, working through this PR:
> >
> > https://github.com/apache/hive/pull/1052
> >
> > I'm trying to run the tests again and hopefully commit.  Github has me
> > listed as a "contributor" and lists Zoltan as a "Member of The Apache
> > Software Foundation".  Do you know how that list of members is managed?
> >
> > https://github.com/orgs/apache/people?query=
> >
> >   Thanks.
> >
> > On Wed, Jun 3, 2020 at 9:08 AM David Mollitor  wrote:
> >
> >> Hello Zoltan,
> >>
> >> Regarding auto-close in JIRA.  Take a look at Apache Avro project.  I've
> >> contributed there a little bit and I think they have this capability.
> >>
> >> Thanks.
> >>
> >> On Tue, Jun 2, 2020 at 8:00 PM Stamatis Zampetakis 
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> I am very happy working with the new system. Many thanks Zoltan!
> >>>
> >>> I find the bot a good idea and I think its worth trying it out.
> >>> One thing to watch out is the case where contributors are willing to
> push
> >>> their work forward but there are no available reviewers to look to each
> >>> case.
> >>> I think people will reply to the bot once or twice but I don't think
> they
> >>> will do it much longer so we could take this into account for the
> >>> configuration of the bot.
> >>>
> >>> Regarding merge squash option there might be a small caveat. I don't
> know
> >>> if it is possible to retain the information about the person who
> performed
> >>> the merge.
> >>> According to the discussion in [1] it seems that the committer in this
> >>> case
> >>> will appear to be the GitHub account.
> >>> This might not be a big problem for Hive since the reviewer's name is
> part
> >>> of the commit message so the credit and responsibility is not lost.
> >>>
> >>> Best,
> >>> Stamatis
> >>>
> >>> [1] https://github.com/isaacs/github/issues/1303
> >>>
> >>>
> >>>
> >>> On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:
> >>>
> 
> 
>  On 6/2/20 9:15 PM, David Mollitor wrote:
> > I use a personal account for GitHub and it's not synced with my
> >>> official
> > Apache account.  How do I go about registering my Apache account with
> > GitHub so I can merge through their interface?
> 
>  IIRC I've linked my account by using this interface:
>  https://gitbox.apache.org/setup/
> 
> >
> > In the meanwhile, can you assist with a merge here? :)
> >
> 
>  sure; I think you should also add dmolli...@apache.org as a secondary
>  email to your github account
> 
>  About the open pr stuff: I still think our best approach of handling
> >>> those
>  things would be to close most of that 400 or so PRs...easiest would be
> >>> to
>  install the bot (at
>  least temporarily)
>  https://issues.apache.org/jira/browse/HIVE-23590
>  what do you think?
> 
>  cheers,
>  Zoltan
> 
> 
> > https://github.com/apache/hive/pull/1045
> >
> > Thanks!
> >
> > On Tue, Jun 2, 2020 at 10:21 AM Zoltan Haindrich 
> wrote:
> >
> >>
> >>
> >> On 

Re: Open old PRs

2020-06-04 Thread Zoltan Haindrich

Hey David!

> I'm trying to run the tests again and hopefully commit.  Github has me
> listed as a "contributor" and lists Zoltan as a "Member of The Apache
> Software Foundation".  Do you know how that list of members is managed?

I don't know; maybe you should re-link your account? You could also try to follow this wiki page: 
https://cwiki.apache.org/confluence/display/OPENWHISK/Accessing+Apache+GitHub+as+a+Committer

It was not my intention to move the patch approval process entirely to github - 
it's just an option.
A balanced solution could be that we still upload the patch to the jira to have 
it there as well - and continue commiting by pushing it manually.
I feel that this thread have deviated from it's original topic: I saw is that the big 
pile of open PRs may backfire on the system if the jobs "branch scanner" is 
triggered.

>> Regarding auto-close in JIRA.  Take a look at Apache Avro project.  I've
>> contributed there a little bit and I think they have this capability.

Yes; it would make sense to auto-close jira tickets basaed on PR-s; but it will 
pull up a lot of small questions(like what should it set as fix version?)
Right now I don't see it as a big deal; I think that flaky tests are a bigger problem - and there is also some kind of kubernetes issue from time-to-time - which takes down 
a full executor node with all its pods for no apparent reason. I'm putting my efforts into fixing that...


chers,
Zoltan


On 6/3/20 3:15 PM, David Mollitor wrote:

Hmm, OK, working through this PR:

https://github.com/apache/hive/pull/1052

I'm trying to run the tests again and hopefully commit.  Github has me
listed as a "contributor" and lists Zoltan as a "Member of The Apache
Software Foundation".  Do you know how that list of members is managed?

https://github.com/orgs/apache/people?query=

  Thanks.

On Wed, Jun 3, 2020 at 9:08 AM David Mollitor  wrote:


Hello Zoltan,

Regarding auto-close in JIRA.  Take a look at Apache Avro project.  I've
contributed there a little bit and I think they have this capability.

Thanks.

On Tue, Jun 2, 2020 at 8:00 PM Stamatis Zampetakis 
wrote:


Hello,

I am very happy working with the new system. Many thanks Zoltan!

I find the bot a good idea and I think its worth trying it out.
One thing to watch out is the case where contributors are willing to push
their work forward but there are no available reviewers to look to each
case.
I think people will reply to the bot once or twice but I don't think they
will do it much longer so we could take this into account for the
configuration of the bot.

Regarding merge squash option there might be a small caveat. I don't know
if it is possible to retain the information about the person who performed
the merge.
According to the discussion in [1] it seems that the committer in this
case
will appear to be the GitHub account.
This might not be a big problem for Hive since the reviewer's name is part
of the commit message so the credit and responsibility is not lost.

Best,
Stamatis

[1] https://github.com/isaacs/github/issues/1303



On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:




On 6/2/20 9:15 PM, David Mollitor wrote:

I use a personal account for GitHub and it's not synced with my

official

Apache account.  How do I go about registering my Apache account with
GitHub so I can merge through their interface?


IIRC I've linked my account by using this interface:
https://gitbox.apache.org/setup/



In the meanwhile, can you assist with a merge here? :)



sure; I think you should also add dmolli...@apache.org as a secondary
email to your github account

About the open pr stuff: I still think our best approach of handling

those

things would be to close most of that 400 or so PRs...easiest would be

to

install the bot (at
least temporarily)
https://issues.apache.org/jira/browse/HIVE-23590
what do you think?

cheers,
Zoltan



https://github.com/apache/hive/pull/1045

Thanks!

On Tue, Jun 2, 2020 at 10:21 AM Zoltan Haindrich  wrote:




On 6/2/20 3:10 PM, David Mollitor wrote:

I think we might want to take one manual pass across the board.  It

will

most likely take more than 7 days to get through them all, so it

may be

closing things that are legitimate.


yeah...a manual pass would be good; I went thru around 10 or so

before

I've wrote the first mail in this thread...
and I definetly don't want to go thru 400 - so I would preffer the

bot

:D




One low hanging fruit (that applied to one of my PRs).  The JIRA it

was

associated with was already closed.  Is there a way to target those?


yes; there might be certainly a lot of those...(that's why I've

estimate

to 1/3 to be applicable)
but filtering out even this is an awful lot of work (or it might

involve

writing a "bot")...
if it's important enough the contributor could reopen / rebase the

patch.

We could try to communicate the non-hostaile intention in the message
placed by the bot.
The current message is the stale PRs would get is:
"This pull 

Re: Open old PRs

2020-06-04 Thread David Mollitor
Hey Zoltan,

Any idea how I can get to be a part of "member" group so I can merge some
PRs?

Thanks!

On Thu, Jun 4, 2020 at 11:40 AM Zoltan Haindrich  wrote:

> Hey Stamatis!
>
> Yes, right now we should put the contributor's name into the commit
> message/etc - but in case we are using PRs that info is also present on the
> PR (or we could change the
> PR label to conform to our rules)
> Thank for the reference - I've just seen that there is also a "bot" to
> possibly avoid the issues squashing might cause :D
> https://github.com/marketplace/actions/autosquash
>
> cheers,
> Zoltan
>
> On 6/3/20 2:00 AM, Stamatis Zampetakis wrote:
> > Hello,
> >
> > I am very happy working with the new system. Many thanks Zoltan!
> >
> > I find the bot a good idea and I think its worth trying it out.
> > One thing to watch out is the case where contributors are willing to push
> > their work forward but there are no available reviewers to look to each
> > case.
> > I think people will reply to the bot once or twice but I don't think they
> > will do it much longer so we could take this into account for the
> > configuration of the bot.
> >
> > Regarding merge squash option there might be a small caveat. I don't know
> > if it is possible to retain the information about the person who
> performed
> > the merge.
> > According to the discussion in [1] it seems that the committer in this
> case
> > will appear to be the GitHub account.
> > This might not be a big problem for Hive since the reviewer's name is
> part
> > of the commit message so the credit and responsibility is not lost.
> >
> > Best,
> > Stamatis
> >
> > [1] https://github.com/isaacs/github/issues/1303
> >
> >
> >
> > On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:
> >
> >>
> >>
> >> On 6/2/20 9:15 PM, David Mollitor wrote:
> >>> I use a personal account for GitHub and it's not synced with my
> official
> >>> Apache account.  How do I go about registering my Apache account with
> >>> GitHub so I can merge through their interface?
> >>
> >> IIRC I've linked my account by using this interface:
> >> https://gitbox.apache.org/setup/
> >>
> >>>
> >>> In the meanwhile, can you assist with a merge here? :)
> >>>
> >>
> >> sure; I think you should also add dmolli...@apache.org as a secondary
> >> email to your github account
> >>
> >> About the open pr stuff: I still think our best approach of handling
> those
> >> things would be to close most of that 400 or so PRs...easiest would be
> to
> >> install the bot (at
> >> least temporarily)
> >> https://issues.apache.org/jira/browse/HIVE-23590
> >> what do you think?
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >>> https://github.com/apache/hive/pull/1045
> >>>
> >>> Thanks!
> >>>
> >>> On Tue, Jun 2, 2020 at 10:21 AM Zoltan Haindrich  wrote:
> >>>
> 
> 
>  On 6/2/20 3:10 PM, David Mollitor wrote:
> > I think we might want to take one manual pass across the board.  It
> >> will
> > most likely take more than 7 days to get through them all, so it may
> be
> > closing things that are legitimate.
> 
>  yeah...a manual pass would be good; I went thru around 10 or so before
>  I've wrote the first mail in this thread...
>  and I definetly don't want to go thru 400 - so I would preffer the bot
> >> :D
> 
> >
> > One low hanging fruit (that applied to one of my PRs).  The JIRA it
> was
> > associated with was already closed.  Is there a way to target those?
> 
>  yes; there might be certainly a lot of those...(that's why I've
> estimate
>  to 1/3 to be applicable)
>  but filtering out even this is an awful lot of work (or it might
> involve
>  writing a "bot")...
>  if it's important enough the contributor could reopen / rebase the
> >> patch.
>  We could try to communicate the non-hostaile intention in the message
>  placed by the bot.
>  The current message is the stale PRs would get is:
>  "This pull request has been automatically marked as stale because it
> has
>  not had recent activity. It will be closed if no further activity
> >> occurs."
> 
> > Also, I have submitted my first PR to test out the new system.  It
> > has passed tests.  Ashutoshc has generously provided a +1.  What's
> the
> > next step to get it merged into the master?  Do I download the patch
> >> from
> > Github and apply manually using my Apache credentials?  Is the
> "merge"
> > feature setup in Github?  As I understand it, GitHub is only
> mirroring
>  the
> > Apache git system.  Whatever the process we need an update in the
> > HowToContribute docs.
> 
>  That's an interesting question; the github repo is linked to the
> apache
>  repo - so you may push/merge/whatever on the github interface; it will
> >> work.
>  Github supports 3 modes to merge PRs:
>  * We should definetly disable the "merge" option as that will just
> >> create
>  a internation railways station from our history :)

Re: Open old PRs

2020-06-04 Thread Zoltan Haindrich

Hey Stamatis!

Yes, right now we should put the contributor's name into the commit message/etc - but in case we are using PRs that info is also present on the PR (or we could change the 
PR label to conform to our rules)

Thank for the reference - I've just seen that there is also a "bot" to possibly 
avoid the issues squashing might cause :D
https://github.com/marketplace/actions/autosquash

cheers,
Zoltan

On 6/3/20 2:00 AM, Stamatis Zampetakis wrote:

Hello,

I am very happy working with the new system. Many thanks Zoltan!

I find the bot a good idea and I think its worth trying it out.
One thing to watch out is the case where contributors are willing to push
their work forward but there are no available reviewers to look to each
case.
I think people will reply to the bot once or twice but I don't think they
will do it much longer so we could take this into account for the
configuration of the bot.

Regarding merge squash option there might be a small caveat. I don't know
if it is possible to retain the information about the person who performed
the merge.
According to the discussion in [1] it seems that the committer in this case
will appear to be the GitHub account.
This might not be a big problem for Hive since the reviewer's name is part
of the commit message so the credit and responsibility is not lost.

Best,
Stamatis

[1] https://github.com/isaacs/github/issues/1303



On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:




On 6/2/20 9:15 PM, David Mollitor wrote:

I use a personal account for GitHub and it's not synced with my official
Apache account.  How do I go about registering my Apache account with
GitHub so I can merge through their interface?


IIRC I've linked my account by using this interface:
https://gitbox.apache.org/setup/



In the meanwhile, can you assist with a merge here? :)



sure; I think you should also add dmolli...@apache.org as a secondary
email to your github account

About the open pr stuff: I still think our best approach of handling those
things would be to close most of that 400 or so PRs...easiest would be to
install the bot (at
least temporarily)
https://issues.apache.org/jira/browse/HIVE-23590
what do you think?

cheers,
Zoltan



https://github.com/apache/hive/pull/1045

Thanks!

On Tue, Jun 2, 2020 at 10:21 AM Zoltan Haindrich  wrote:




On 6/2/20 3:10 PM, David Mollitor wrote:

I think we might want to take one manual pass across the board.  It

will

most likely take more than 7 days to get through them all, so it may be
closing things that are legitimate.


yeah...a manual pass would be good; I went thru around 10 or so before
I've wrote the first mail in this thread...
and I definetly don't want to go thru 400 - so I would preffer the bot

:D




One low hanging fruit (that applied to one of my PRs).  The JIRA it was
associated with was already closed.  Is there a way to target those?


yes; there might be certainly a lot of those...(that's why I've estimate
to 1/3 to be applicable)
but filtering out even this is an awful lot of work (or it might involve
writing a "bot")...
if it's important enough the contributor could reopen / rebase the

patch.

We could try to communicate the non-hostaile intention in the message
placed by the bot.
The current message is the stale PRs would get is:
"This pull request has been automatically marked as stale because it has
not had recent activity. It will be closed if no further activity

occurs."



Also, I have submitted my first PR to test out the new system.  It
has passed tests.  Ashutoshc has generously provided a +1.  What's the
next step to get it merged into the master?  Do I download the patch

from

Github and apply manually using my Apache credentials?  Is the "merge"
feature setup in Github?  As I understand it, GitHub is only mirroring

the

Apache git system.  Whatever the process we need an update in the
HowToContribute docs.


That's an interesting question; the github repo is linked to the apache
repo - so you may push/merge/whatever on the github interface; it will

work.

Github supports 3 modes to merge PRs:
* We should definetly disable the "merge" option as that will just

create

a internation railways station from our history :)
* rebase doesn't make it easier for reviewier to keep track new
changes...because the PR owner have to continuosly force push the branch
* squash merge work great - and I remembered that it changes the author

to

the user pushing the "squash" button; however right now it seems that it
changes the author to
the "user who opened the pr" which looks good-enough for me!
(I've added the neccessary .asf.yaml changes to the existing PR)

cheers,
Zoltan


https://github.com/apache/hive/pull/1045




https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ApplyingaPatch



Thanks!

On Tue, Jun 2, 2020 at 4:58 AM Zoltan Haindrich  wrote:


I think to use "probot" we would need to ask infra to configure the
"probot" github app.
It seems to me that the 

[jira] [Created] (HIVE-23613) Cleanup FindBugs

2020-06-04 Thread Jira
László Bodor created HIVE-23613:
---

 Summary: Cleanup FindBugs
 Key: HIVE-23613
 URL: https://issues.apache.org/jira/browse/HIVE-23613
 Project: Hive
  Issue Type: Bug
Reporter: László Bodor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23612) Option for HiveStrictManagedMigration to impersonate a user for FS operations

2020-06-04 Thread Jira
Ádám Szita created HIVE-23612:
-

 Summary: Option for HiveStrictManagedMigration to impersonate a 
user for FS operations
 Key: HIVE-23612
 URL: https://issues.apache.org/jira/browse/HIVE-23612
 Project: Hive
  Issue Type: Improvement
Reporter: Ádám Szita
Assignee: Ádám Szita


HiveStrictManagedMigration tool can be used to move HDFS paths and to change 
ownership on said paths. It may be beneficial to do such file system operations 
as a different user than the one the tool itself is run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23611) Mandate fully qualified absolute path for for external table base dir during REPL operation

2020-06-04 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-23611:
---

 Summary: Mandate fully qualified absolute path for for external 
table base dir during REPL operation
 Key: HIVE-23611
 URL: https://issues.apache.org/jira/browse/HIVE-23611
 Project: Hive
  Issue Type: Improvement
Reporter: Pravin Sinha
Assignee: Pravin Sinha






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23610) HiveQueryResultSet#next() throw read timeout when using hikari for rdbms

2020-06-04 Thread Chen LT (Jira)
Chen LT created HIVE-23610:
--

 Summary: HiveQueryResultSet#next() throw read timeout when using 
hikari for rdbms
 Key: HIVE-23610
 URL: https://issues.apache.org/jira/browse/HIVE-23610
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 2.1.1
 Environment: Hadoop-2.7.3

Hive-2.1.1

JDK-1.8.0

HikariCP-3.2.0
Reporter: Chen LT


For example, when using hikari connection pool for mysql, it will set 
DriverManager.loginTimeout to 30 seconds by default, but HiveConection uses 
DriverManager.loginTimeout to create underlying transport 
(TSocket.socketTimeout) , HiveQueryResultSet#next() will throw read timeout 
exception when calling is blocked longer than 30 seconds 
{code:java}
Caused by: org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed outCaused by: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at 
org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376) 
at 
org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453) 
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435) at 
org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77) at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:559)
 at 
org.apache.hive.service.rpc.thrift.TCLIService$Client.FetchResults(TCLIService.java:546)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1412)
 at com.sun.proxy.$Proxy125.FetchResults(Unknown Source) at 
org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:372)
...{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23609) SemiJoin: Relax big table size check for self-joins

2020-06-04 Thread Gopal Vijayaraghavan (Jira)
Gopal Vijayaraghavan created HIVE-23609:
---

 Summary: SemiJoin: Relax big table size check for self-joins
 Key: HIVE-23609
 URL: https://issues.apache.org/jira/browse/HIVE-23609
 Project: Hive
  Issue Type: Improvement
Reporter: Gopal Vijayaraghavan


For self-joins, several other heuristics applied to Semijoins don't apply as 
the difference between rows on either side is likely to result in an actual 
reduction of rows scanned.

This change results in slightly different Tez priorities for self-joins which 
are heavily filtered on one side over the other, which helps ensure the smaller 
table is completed before the bigger table consumes resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23608) Change an FS#exists call to FS#isFile call in AcidUtils

2020-06-04 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-23608:


 Summary: Change an FS#exists call to FS#isFile call in AcidUtils
 Key: HIVE-23608
 URL: https://issues.apache.org/jira/browse/HIVE-23608
 Project: Hive
  Issue Type: Improvement
Reporter: Karen Coppage
Assignee: Karen Coppage


Currently S3AFileSystem#isFile and S3AFileSystem#exists have the same 
implementation. HADOOP-13230 will optimize S3AFileSystem#isFile by only doing a 
HEAD request for the file; no need for a LIST probe for a directory (isDir will 
do that). S3AFileSystem#exists will still need both.

This and HIVE-23533 will get rid of the last exists() calls in AcidUtils.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)