[jira] [Created] (HIVE-23608) Change an FS#exists call to FS#isFile call in AcidUtils

2020-06-03 Thread Karen Coppage (Jira)
Karen Coppage created HIVE-23608:


 Summary: Change an FS#exists call to FS#isFile call in AcidUtils
 Key: HIVE-23608
 URL: https://issues.apache.org/jira/browse/HIVE-23608
 Project: Hive
  Issue Type: Improvement
Reporter: Karen Coppage
Assignee: Karen Coppage


Currently S3AFileSystem#isFile and S3AFileSystem#exists have the same 
implementation. HADOOP-13230 will optimize S3AFileSystem#isFile by only doing a 
HEAD request for the file; no need for a LIST probe for a directory (isDir will 
do that). S3AFileSystem#exists will still need both.

This and HIVE-23533 will get rid of the last exists() calls in AcidUtils.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23607) Permission Issue: Create view on another view succeeds but alter view fails

2020-06-03 Thread Naresh P R (Jira)
Naresh P R created HIVE-23607:
-

 Summary: Permission Issue: Create view on another view succeeds 
but alter view fails  
 Key: HIVE-23607
 URL: https://issues.apache.org/jira/browse/HIVE-23607
 Project: Hive
  Issue Type: Bug
Reporter: Naresh P R
Assignee: Naresh P R


create table test_table (id int);
create view test_view as select * from test_table;

 
{code:java}
-- user "naresh" as read access on test_view
-- Create view succeeds
create view test_view_1 as select * from test_view;
-- Alter view fails
alter view test_view_1 as select * from test_view
Error: Error while compiling statement: FAILED: HiveAccessControlException 
Permission denied: user [naresh] does not have [SELECT] privilege on 
[test/test_table] (state=42000,code=4)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23606) LLAP: Delay In DirectByteBuffer Clean Up For EncodedReaderImpl

2020-06-03 Thread Syed Shameerur Rahman (Jira)
Syed Shameerur Rahman created HIVE-23606:


 Summary: LLAP: Delay In DirectByteBuffer Clean Up For 
EncodedReaderImpl
 Key: HIVE-23606
 URL: https://issues.apache.org/jira/browse/HIVE-23606
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman
 Fix For: 4.0.0


DirectByteBuffler are only cleaned up when there is Full GC or manually invoked 
cleaner method of DirectByteBuffer, Since full GC may take some time to kick 
in, In the meanwhile the native memory usage of LLAP daemon process might shoot 
up and this will force the YARN pmem monitor to kill the container running the 
daemon.

HIVE-16180 tried to solve this problem, but the code structure got messed up 
after HIVE-15665

The IdentityHashMap (toRelease) is initialized in 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java#L409
 , but it is getting re-initialized inside the method getDataFromCacheAndDisk() 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java#L633
  which makes it local to that method hence the original toRelease 
IdentityHashMap remains empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Gopal V



+1

Cheers,
Gopal

On 6/3/20 7:48 PM, Jesus Camacho Rodriguez wrote:

+1

-Jesús

On Wed, Jun 3, 2020 at 1:58 PM Alan Gates  wrote:


+1.

Alan.

On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
 wrote:


+1


On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan 

wrote:


+1

On Wed, Jun 3, 2020 at 1:23 PM David Mollitor 

wrote:



Hello Gang,

I have spent some time working on upgrading Avro (far less than

others):


https://issues.apache.org/jira/browse/HIVE-21737

This should be a relatively easy thing to do, but is blocked by
Hive-on-Spark.  HoS has a weird thing where it downloads some
cloud-storage-hosted file of Spark-Hadoop as part of its maven run.

Since HoS is not going to receive updates from the major vendors, is

it

time to simply remove it?

Tests are currently disabled:
https://issues.apache.org/jira/browse/HIVE-23137

Thanks.










Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Jesus Camacho Rodriguez
+1

-Jesús

On Wed, Jun 3, 2020 at 1:58 PM Alan Gates  wrote:

> +1.
>
> Alan.
>
> On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
>  wrote:
>
> > +1
> >
> > > On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan 
> > wrote:
> > >
> > > +1
> > >
> > > On Wed, Jun 3, 2020 at 1:23 PM David Mollitor 
> wrote:
> > >
> > >> Hello Gang,
> > >>
> > >> I have spent some time working on upgrading Avro (far less than
> others):
> > >>
> > >> https://issues.apache.org/jira/browse/HIVE-21737
> > >>
> > >> This should be a relatively easy thing to do, but is blocked by
> > >> Hive-on-Spark.  HoS has a weird thing where it downloads some
> > >> cloud-storage-hosted file of Spark-Hadoop as part of its maven run.
> > >>
> > >> Since HoS is not going to receive updates from the major vendors, is
> it
> > >> time to simply remove it?
> > >>
> > >> Tests are currently disabled:
> > >> https://issues.apache.org/jira/browse/HIVE-23137
> > >>
> > >> Thanks.
> > >>
> >
> >
>


[jira] [Created] (HIVE-23605) Wrong FS error during _external_tables_info creation when staging location is remote

2020-06-03 Thread Pravin Sinha (Jira)
Pravin Sinha created HIVE-23605:
---

 Summary: Wrong FS error during _external_tables_info creation when 
staging location is remote
 Key: HIVE-23605
 URL: https://issues.apache.org/jira/browse/HIVE-23605
 Project: Hive
  Issue Type: Bug
Reporter: Pravin Sinha
Assignee: Pravin Sinha


When staging location is on target cluster, Repl Dump fails to create 
_external_tables_info file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Alan Gates
+1.

Alan.

On Wed, Jun 3, 2020 at 1:40 PM Prasanth Jayachandran
 wrote:

> +1
>
> > On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan 
> wrote:
> >
> > +1
> >
> > On Wed, Jun 3, 2020 at 1:23 PM David Mollitor  wrote:
> >
> >> Hello Gang,
> >>
> >> I have spent some time working on upgrading Avro (far less than others):
> >>
> >> https://issues.apache.org/jira/browse/HIVE-21737
> >>
> >> This should be a relatively easy thing to do, but is blocked by
> >> Hive-on-Spark.  HoS has a weird thing where it downloads some
> >> cloud-storage-hosted file of Spark-Hadoop as part of its maven run.
> >>
> >> Since HoS is not going to receive updates from the major vendors, is it
> >> time to simply remove it?
> >>
> >> Tests are currently disabled:
> >> https://issues.apache.org/jira/browse/HIVE-23137
> >>
> >> Thanks.
> >>
>
>


Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Prasanth Jayachandran
+1

> On Jun 3, 2020, at 1:38 PM, Ashutosh Chauhan  wrote:
> 
> +1
> 
> On Wed, Jun 3, 2020 at 1:23 PM David Mollitor  wrote:
> 
>> Hello Gang,
>> 
>> I have spent some time working on upgrading Avro (far less than others):
>> 
>> https://issues.apache.org/jira/browse/HIVE-21737
>> 
>> This should be a relatively easy thing to do, but is blocked by
>> Hive-on-Spark.  HoS has a weird thing where it downloads some
>> cloud-storage-hosted file of Spark-Hadoop as part of its maven run.
>> 
>> Since HoS is not going to receive updates from the major vendors, is it
>> time to simply remove it?
>> 
>> Tests are currently disabled:
>> https://issues.apache.org/jira/browse/HIVE-23137
>> 
>> Thanks.
>> 



Re: Time to Remove Hive-on-Spark

2020-06-03 Thread Ashutosh Chauhan
+1

On Wed, Jun 3, 2020 at 1:23 PM David Mollitor  wrote:

> Hello Gang,
>
> I have spent some time working on upgrading Avro (far less than others):
>
> https://issues.apache.org/jira/browse/HIVE-21737
>
> This should be a relatively easy thing to do, but is blocked by
> Hive-on-Spark.  HoS has a weird thing where it downloads some
> cloud-storage-hosted file of Spark-Hadoop as part of its maven run.
>
> Since HoS is not going to receive updates from the major vendors, is it
> time to simply remove it?
>
> Tests are currently disabled:
> https://issues.apache.org/jira/browse/HIVE-23137
>
> Thanks.
>


Time to Remove Hive-on-Spark

2020-06-03 Thread David Mollitor
Hello Gang,

I have spent some time working on upgrading Avro (far less than others):

https://issues.apache.org/jira/browse/HIVE-21737

This should be a relatively easy thing to do, but is blocked by
Hive-on-Spark.  HoS has a weird thing where it downloads some
cloud-storage-hosted file of Spark-Hadoop as part of its maven run.

Since HoS is not going to receive updates from the major vendors, is it
time to simply remove it?

Tests are currently disabled:
https://issues.apache.org/jira/browse/HIVE-23137

Thanks.


[jira] [Created] (HIVE-23604) LLAP does not have correct version of guava after HIVE-22126

2020-06-03 Thread Yuanhao Lu (Jira)
Yuanhao Lu created HIVE-23604:
-

 Summary: LLAP does not have correct version of guava after 
HIVE-22126
 Key: HIVE-23604
 URL: https://issues.apache.org/jira/browse/HIVE-23604
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Yuanhao Lu


This JIRA https://issues.apache.org/jira/browse/HIVE-22126 shaded guava in 
hive-exec. There is a issue that LLAP is also using this guava, so after 
shading, it cannot use hive-exec guava and may turn to `./tez/guava-11.0.2.jar` 
which will cause following error
{code:java}
Status: Running (Executing on YARN cluster with App id 
application_1591081923777_0005)
 
Map 1: -/-  Reducer 2: 0/1  
Map 1: 0/11 Reducer 2: 0/1  
Map 1: 0(+11,-11)/11Reducer 2: 0/1  
Map 1: 0(+0,-32)/11 Reducer 2: 0/1  
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1591081923777_0005_1_00, 
diagnostics=[Task failed, taskId=task_1591081923777_0005_1_00_02, 
diagnostics=[TaskAttempt 0 failed, 
info=[org.apache.hadoop.ipc.RemoteException(java.lang.NoSuchMethodError): 
com.google.common.base.Stopwatch.createUnstarted()Lcom/google/common/base/Stopwatch;
at 
org.apache.hadoop.hive.llap.daemon.impl.TaskRunnerCallable.(TaskRunnerCallable.java:122)
at 
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:274)
at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:558)
at 
org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:104)
at 
org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:19020)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
{code}
This can be solved by adding `com.google.common.base.Stopwatch.class` to this 
section 
[https://github.com/apache/hive/blob/release-3.1.2-rc0/llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapServiceDriver.java#L385-L409]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23603) transformDatabase() should work with changes from HIVE-22995

2020-06-03 Thread Naveen Gangam (Jira)
Naveen Gangam created HIVE-23603:


 Summary: transformDatabase() should work with changes from 
HIVE-22995
 Key: HIVE-23603
 URL: https://issues.apache.org/jira/browse/HIVE-23603
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Reporter: Naveen Gangam
Assignee: Naveen Gangam
 Fix For: 4.0.0


The translation layer alters the locationUri on Database based on the 
capabilities of the client. Now that we have separate locations for managed and 
external for database, the implementation should be adjusted to work with both 
locations. locationUri could already be external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23602) Use Java Concurrent Package in Operation Handle Set

2020-06-03 Thread David Mollitor (Jira)
David Mollitor created HIVE-23602:
-

 Summary: Use Java Concurrent Package in Operation Handle Set
 Key: HIVE-23602
 URL: https://issues.apache.org/jira/browse/HIVE-23602
 Project: Hive
  Issue Type: Bug
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23601) Hive Statement Does Not Clear Statement Handle on Error

2020-06-03 Thread David Mollitor (Jira)
David Mollitor created HIVE-23601:
-

 Summary: Hive Statement Does Not Clear Statement Handle on Error
 Key: HIVE-23601
 URL: https://issues.apache.org/jira/browse/HIVE-23601
 Project: Hive
  Issue Type: Bug
Reporter: David Mollitor
Assignee: David Mollitor


{code:java}
  private void closeStatementIfNeeded() throws SQLException {
try {
  if (stmtHandle != null) {
TCloseOperationReq closeReq = new TCloseOperationReq(stmtHandle);
TCloseOperationResp closeResp = client.CloseOperation(closeReq);
Utils.verifySuccessWithInfo(closeResp.getStatus());
stmtHandle = null;
  }
} catch (SQLException e) {
  throw e;
} catch (Exception e) {
  throw new SQLException("Failed to close statement", "08S01", e);
}
  }

 void closeClientOperation() throws SQLException {
closeStatementIfNeeded();
isQueryClosed = true;
stmtHandle = null;
  }
{code}
{{verifySuccessWithInfo}} throws an {{Exception}} if it finds an error code and 
therefore leapfrogs over setting the statement handle to null (twice).  
Probably not what is intended since the original author(s) are tried twice to 
null it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Open old PRs

2020-06-03 Thread David Mollitor
Hmm, OK, working through this PR:

https://github.com/apache/hive/pull/1052

I'm trying to run the tests again and hopefully commit.  Github has me
listed as a "contributor" and lists Zoltan as a "Member of The Apache
Software Foundation".  Do you know how that list of members is managed?

https://github.com/orgs/apache/people?query=

 Thanks.

On Wed, Jun 3, 2020 at 9:08 AM David Mollitor  wrote:

> Hello Zoltan,
>
> Regarding auto-close in JIRA.  Take a look at Apache Avro project.  I've
> contributed there a little bit and I think they have this capability.
>
> Thanks.
>
> On Tue, Jun 2, 2020 at 8:00 PM Stamatis Zampetakis 
> wrote:
>
>> Hello,
>>
>> I am very happy working with the new system. Many thanks Zoltan!
>>
>> I find the bot a good idea and I think its worth trying it out.
>> One thing to watch out is the case where contributors are willing to push
>> their work forward but there are no available reviewers to look to each
>> case.
>> I think people will reply to the bot once or twice but I don't think they
>> will do it much longer so we could take this into account for the
>> configuration of the bot.
>>
>> Regarding merge squash option there might be a small caveat. I don't know
>> if it is possible to retain the information about the person who performed
>> the merge.
>> According to the discussion in [1] it seems that the committer in this
>> case
>> will appear to be the GitHub account.
>> This might not be a big problem for Hive since the reviewer's name is part
>> of the commit message so the credit and responsibility is not lost.
>>
>> Best,
>> Stamatis
>>
>> [1] https://github.com/isaacs/github/issues/1303
>>
>>
>>
>> On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:
>>
>> >
>> >
>> > On 6/2/20 9:15 PM, David Mollitor wrote:
>> > > I use a personal account for GitHub and it's not synced with my
>> official
>> > > Apache account.  How do I go about registering my Apache account with
>> > > GitHub so I can merge through their interface?
>> >
>> > IIRC I've linked my account by using this interface:
>> > https://gitbox.apache.org/setup/
>> >
>> > >
>> > > In the meanwhile, can you assist with a merge here? :)
>> > >
>> >
>> > sure; I think you should also add dmolli...@apache.org as a secondary
>> > email to your github account
>> >
>> > About the open pr stuff: I still think our best approach of handling
>> those
>> > things would be to close most of that 400 or so PRs...easiest would be
>> to
>> > install the bot (at
>> > least temporarily)
>> > https://issues.apache.org/jira/browse/HIVE-23590
>> > what do you think?
>> >
>> > cheers,
>> > Zoltan
>> >
>> >
>> > > https://github.com/apache/hive/pull/1045
>> > >
>> > > Thanks!
>> > >
>> > > On Tue, Jun 2, 2020 at 10:21 AM Zoltan Haindrich  wrote:
>> > >
>> > >>
>> > >>
>> > >> On 6/2/20 3:10 PM, David Mollitor wrote:
>> > >>> I think we might want to take one manual pass across the board.  It
>> > will
>> > >>> most likely take more than 7 days to get through them all, so it
>> may be
>> > >>> closing things that are legitimate.
>> > >>
>> > >> yeah...a manual pass would be good; I went thru around 10 or so
>> before
>> > >> I've wrote the first mail in this thread...
>> > >> and I definetly don't want to go thru 400 - so I would preffer the
>> bot
>> > :D
>> > >>
>> > >>>
>> > >>> One low hanging fruit (that applied to one of my PRs).  The JIRA it
>> was
>> > >>> associated with was already closed.  Is there a way to target those?
>> > >>
>> > >> yes; there might be certainly a lot of those...(that's why I've
>> estimate
>> > >> to 1/3 to be applicable)
>> > >> but filtering out even this is an awful lot of work (or it might
>> involve
>> > >> writing a "bot")...
>> > >> if it's important enough the contributor could reopen / rebase the
>> > patch.
>> > >> We could try to communicate the non-hostaile intention in the message
>> > >> placed by the bot.
>> > >> The current message is the stale PRs would get is:
>> > >> "This pull request has been automatically marked as stale because it
>> has
>> > >> not had recent activity. It will be closed if no further activity
>> > occurs."
>> > >>
>> > >>> Also, I have submitted my first PR to test out the new system.  It
>> > >>> has passed tests.  Ashutoshc has generously provided a +1.  What's
>> the
>> > >>> next step to get it merged into the master?  Do I download the patch
>> > from
>> > >>> Github and apply manually using my Apache credentials?  Is the
>> "merge"
>> > >>> feature setup in Github?  As I understand it, GitHub is only
>> mirroring
>> > >> the
>> > >>> Apache git system.  Whatever the process we need an update in the
>> > >>> HowToContribute docs.
>> > >>
>> > >> That's an interesting question; the github repo is linked to the
>> apache
>> > >> repo - so you may push/merge/whatever on the github interface; it
>> will
>> > work.
>> > >> Github supports 3 modes to merge PRs:
>> > >> * We should definetly disable the "merge" option as that will just
>> > create
>> > >> a int

Re: Open old PRs

2020-06-03 Thread David Mollitor
Hello Zoltan,

Regarding auto-close in JIRA.  Take a look at Apache Avro project.  I've
contributed there a little bit and I think they have this capability.

Thanks.

On Tue, Jun 2, 2020 at 8:00 PM Stamatis Zampetakis 
wrote:

> Hello,
>
> I am very happy working with the new system. Many thanks Zoltan!
>
> I find the bot a good idea and I think its worth trying it out.
> One thing to watch out is the case where contributors are willing to push
> their work forward but there are no available reviewers to look to each
> case.
> I think people will reply to the bot once or twice but I don't think they
> will do it much longer so we could take this into account for the
> configuration of the bot.
>
> Regarding merge squash option there might be a small caveat. I don't know
> if it is possible to retain the information about the person who performed
> the merge.
> According to the discussion in [1] it seems that the committer in this case
> will appear to be the GitHub account.
> This might not be a big problem for Hive since the reviewer's name is part
> of the commit message so the credit and responsibility is not lost.
>
> Best,
> Stamatis
>
> [1] https://github.com/isaacs/github/issues/1303
>
>
>
> On Tue, Jun 2, 2020 at 9:26 PM Zoltan Haindrich  wrote:
>
> >
> >
> > On 6/2/20 9:15 PM, David Mollitor wrote:
> > > I use a personal account for GitHub and it's not synced with my
> official
> > > Apache account.  How do I go about registering my Apache account with
> > > GitHub so I can merge through their interface?
> >
> > IIRC I've linked my account by using this interface:
> > https://gitbox.apache.org/setup/
> >
> > >
> > > In the meanwhile, can you assist with a merge here? :)
> > >
> >
> > sure; I think you should also add dmolli...@apache.org as a secondary
> > email to your github account
> >
> > About the open pr stuff: I still think our best approach of handling
> those
> > things would be to close most of that 400 or so PRs...easiest would be to
> > install the bot (at
> > least temporarily)
> > https://issues.apache.org/jira/browse/HIVE-23590
> > what do you think?
> >
> > cheers,
> > Zoltan
> >
> >
> > > https://github.com/apache/hive/pull/1045
> > >
> > > Thanks!
> > >
> > > On Tue, Jun 2, 2020 at 10:21 AM Zoltan Haindrich  wrote:
> > >
> > >>
> > >>
> > >> On 6/2/20 3:10 PM, David Mollitor wrote:
> > >>> I think we might want to take one manual pass across the board.  It
> > will
> > >>> most likely take more than 7 days to get through them all, so it may
> be
> > >>> closing things that are legitimate.
> > >>
> > >> yeah...a manual pass would be good; I went thru around 10 or so before
> > >> I've wrote the first mail in this thread...
> > >> and I definetly don't want to go thru 400 - so I would preffer the bot
> > :D
> > >>
> > >>>
> > >>> One low hanging fruit (that applied to one of my PRs).  The JIRA it
> was
> > >>> associated with was already closed.  Is there a way to target those?
> > >>
> > >> yes; there might be certainly a lot of those...(that's why I've
> estimate
> > >> to 1/3 to be applicable)
> > >> but filtering out even this is an awful lot of work (or it might
> involve
> > >> writing a "bot")...
> > >> if it's important enough the contributor could reopen / rebase the
> > patch.
> > >> We could try to communicate the non-hostaile intention in the message
> > >> placed by the bot.
> > >> The current message is the stale PRs would get is:
> > >> "This pull request has been automatically marked as stale because it
> has
> > >> not had recent activity. It will be closed if no further activity
> > occurs."
> > >>
> > >>> Also, I have submitted my first PR to test out the new system.  It
> > >>> has passed tests.  Ashutoshc has generously provided a +1.  What's
> the
> > >>> next step to get it merged into the master?  Do I download the patch
> > from
> > >>> Github and apply manually using my Apache credentials?  Is the
> "merge"
> > >>> feature setup in Github?  As I understand it, GitHub is only
> mirroring
> > >> the
> > >>> Apache git system.  Whatever the process we need an update in the
> > >>> HowToContribute docs.
> > >>
> > >> That's an interesting question; the github repo is linked to the
> apache
> > >> repo - so you may push/merge/whatever on the github interface; it will
> > work.
> > >> Github supports 3 modes to merge PRs:
> > >> * We should definetly disable the "merge" option as that will just
> > create
> > >> a internation railways station from our history :)
> > >> * rebase doesn't make it easier for reviewier to keep track new
> > >> changes...because the PR owner have to continuosly force push the
> branch
> > >> * squash merge work great - and I remembered that it changes the
> author
> > to
> > >> the user pushing the "squash" button; however right now it seems that
> it
> > >> changes the author to
> > >> the "user who opened the pr" which looks good-enough for me!
> > >> (I've added the neccessary .asf.yaml changes to the existing PR)
> > >>
> > >> ch

[jira] [Created] (HIVE-23600) Enable users to provide the construction parameters for sketches

2020-06-03 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23600:
---

 Summary: Enable users to provide the construction parameters for 
sketches
 Key: HIVE-23600
 URL: https://issues.apache.org/jira/browse/HIVE-23600
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23599) CUME_DIST should not return NULL when partitioning column is NULL

2020-06-03 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23599:
---

 Summary: CUME_DIST should not return NULL when partitioning column 
is NULL
 Key: HIVE-23599
 URL: https://issues.apache.org/jira/browse/HIVE-23599
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich


exposed by:
ql/src/test/results/clientpositive/llap/sketches_rewrite_cume_dist_partition_by.q.out

postgres also doesn't return NULLs in these cases

{code}
SELECT id,category, CUME_DIST() OVER (partition by category ORDER BY id) FROM 
sketch_input order by category,id;
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23598) Add option to rewrite NTILE to sketch functions

2020-06-03 Thread Zoltan Haindrich (Jira)
Zoltan Haindrich created HIVE-23598:
---

 Summary: Add option to rewrite NTILE to sketch functions
 Key: HIVE-23598
 URL: https://issues.apache.org/jira/browse/HIVE-23598
 Project: Hive
  Issue Type: Sub-task
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-23597) VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete delta directories multiple times

2020-06-03 Thread Rajesh Balamohan (Jira)
Rajesh Balamohan created HIVE-23597:
---

 Summary: 
VectorizedOrcAcidRowBatchReader::ColumnizedDeleteEventRegistry reads delete 
delta directories multiple times
 Key: HIVE-23597
 URL: https://issues.apache.org/jira/browse/HIVE-23597
 Project: Hive
  Issue Type: Improvement
Reporter: Rajesh Balamohan


[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java#L1562]
{code:java}
try {
final Path[] deleteDeltaDirs = getDeleteDeltaDirsFromSplit(orcSplit);
if (deleteDeltaDirs.length > 0) {
  int totalDeleteEventCount = 0;
  for (Path deleteDeltaDir : deleteDeltaDirs) {
{code}
 

Consider a directory layout like the following. This was created by having 
simple set of "insert --> update --> select" queries.

 
{noformat}
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_001
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/base_002
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_003_003_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_004_004_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_005_005_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_006_006_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_007_007_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_008_008_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_009_009_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_010_010_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_011_011_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_012_012_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delete_delta_013_013_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_003_003_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_004_004_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_005_005_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_006_006_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_007_007_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_008_008_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_009_009_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_010_010_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_011_011_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_012_012_
/warehouse-1591131255-hl5z/warehouse/tablespace/managed/hive/sequential_update_4/delta_013_013_
 {noformat}
 

Orcsplit contains all the delete delta folder information. For the directory 
layout like this, it would create {{~12 splits}}. For every split, it 
constructs "ColumnizedDeleteEventRegistry" in VRBAcidReader and ends up reading 
all these delete delta folders multiple times.
 In this case, it would read it approximately {{121 times!}}.

This causes huge delay in running simple queries like "{{select * from tab_x}}" 
in cloud storage. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Review Request 72488: HIVE-23413: New config to skip all locks

2020-06-03 Thread Peter Varga via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72488/
---

(Updated June 3, 2020, 9:20 a.m.)


Review request for hive, Denys Kuzmenko and Peter Vary.


Changes
---

Renamed the config


Repository: hive-git


Description
---

>From time-to-time some query is blocked on locks which should not.

To have a quick workaround for this we should have a config which the user can 
set in the session to disable acquiring/checking locks, so we can provide it 
immediately and then later investigate and fix the root cause.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java abd12c9a82 
  ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
  ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
8a15b7cc5d 


Diff: https://reviews.apache.org/r/72488/diff/2/

Changes: https://reviews.apache.org/r/72488/diff/1-2/


Testing
---


Thanks,

Peter Varga



Re: Review Request 72488: HIVE-23413: New config to skip all locks

2020-06-03 Thread Peter Varga via Review Board


> On June 3, 2020, 6:53 a.m., Denys Kuzmenko wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 2737 (patched)
> > 
> >
> > I would name config property - HIVE_TXN_DISABLE_LOCKS, to give more 
> > clarification on  its purpose.

Thanks for the review, renamed the config.


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72488/#review220947
---


On June 3, 2020, 9:20 a.m., Peter Varga wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72488/
> ---
> 
> (Updated June 3, 2020, 9:20 a.m.)
> 
> 
> Review request for hive, Denys Kuzmenko and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> From time-to-time some query is blocked on locks which should not.
> 
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java abd12c9a82 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 8a15b7cc5d 
> 
> 
> Diff: https://reviews.apache.org/r/72488/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Peter Varga
> 
>